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ADVANCED CONCEPTS OF APPROXIMATE REASONING 

FINAL TECHNICAL REPORT 
Executive Summary 

Enrique H. Ruspini 
Artificial Intelligence Center 
SRI International 


1 Introduction 

This final report consists primarily of a collection of papers that have been published, pre¬ 
sented, or await publication in various forums presenting the results of research, sponsored by 
the U.S. Army Research Office, on an artificial intelligence discipline known as “Approximate 
Reasoning.” 

This collection includes both detailed technical presentations of approximate reasoning 
issues [1,6], various rummaries of those presentations [4.5,8], and an encompassing overview 
of their significance [2] in the context of a unified formal framework, developed as part of the 
reported research. 

For this reason, we have chosen a format based on inclusion of all papers relevant to our 
research, preceded by this executive summary, which is also intended to guide the interested 
reader to the diverse works that make the bulk of the report. 

The research program on advanced concepts of approximate reasoning had the goal of 
establishing firm formal foundations that explain the different technologies proposed to sohe 
the problems associated with the processing of imprecise and uncertain information, permit 
a comparison of their advantages and disadvantages, and. specially, allow the determination 
of their applicability to specific problems. 

The research results reported herein clarify fundamental aspects of information process¬ 
ing under conditions of imprecision and uncertainty. These results represent particularly 
important steps toward the development of systems for analogical reasoning, i.e., automated 
devices that exploit similarities between scenarios to “extrapolate” from known examples 
into unknown situations. 

Because of their fundamental nature, these results are applicable to a wide variety of 
pioblems of Army interest including intelligence analysis, autonomous device planning and 
control, vulnerability analysis, human factors engineering, material analysis, fault diagnosis, 
reliability analysis, system design, and mission planning and counlerplanning. 

On the basis of the nature of the results obtained during this research and current prac¬ 
tical experience with the applicability of various approximate reasoning techniques, it is 







possible to identify the following applications as being particularly amenable to treatment 
in the near future: 

1. Control of unstable systems, such as helicopters, land vehicles, or weapon platforms, 
by means of possibilistic control techniques 

2. Control of navigation, target tracking, and obstacle avoidance by autonomous mobile 
agents 

3. Elimination of involuntary plalform/band movement in object-tracking tasks. 

4. Development of vulnerability measures and related assessments of structural viabilit y. 

5. Development of approximate models of complex systems. 

6. Coordination of real time intelligent agents on the basis of considerations about their 
usefulness, associated risks, and probability of success. 

2 Approximate Reasoning 

Approximate Reasoning is tlie collective name given to a variety of automated methods and 
techniques for the analysis of imprecise and uncertain information. 

The first task in our investigation was to clarify the nature of the approximate reasoning 
problem: a poorly understood question that was felt to be the basic cause of the controversy 
that characterized the state of the art. Prior characterizations of approximate reasoning 
technology broadly interpreted the epithet ‘approximate” as an indication of either the poor 
quality of the underlying knowledge or that of the proposed techniques, considered to be 
heuristical imitations of the sounder methods of classical logic. 

Our approach to the characterization of the approximate-reasoning problem was based 
on continuation of previous work of the principal investigator (“The Logical Foundations of 
Evidential Reasoning." SRI A1C Technical Note No. 408. 1987). which relied on the logical 
notion of “possible world." The result of these investigations was the development of a 
unified framework for the approximate reasoning problem that is briefly summarized in a 
paper presented at the Fourth International Symposium on Knowledge and its Engineering [7] 
and that is considerably expanded in a related assessment of the state of the art and its 
progress [2]. 

Informally speaking, possible worlds are the conceivable situul.ons. scenarios, states, or 
behaviors of a real-world system, i.e.. the conceivable solutions of a typical situation- or 
state-assessment problem. In those problems, we are typically required to state whether the 
system in question (e.g.. “the weather at Menlo Park”) is (or was. or will be) in such a 
state that certain statements (called hypotheses) about it are true (e.g., “... will be rainy on 
November 15”). 

To answer such questions in the context of a typical reasoning problem, we usually make 
various observations of our system (e.g.. temperatures, pressures) that, when combined with 
existing background knowledge (e.g.. meteorology), eliminate certain conceivable possibilities 
from consideration. The remaining slates, called in our model the evidential »et because of 
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its obvious relationship with observed evidence, are then examined to determine whether all 
its possible states are such that a hypothesis of interest is true in all of them or it is false in 
all of them. 

if that is indeed the case, as illustrated in Figure 1. then the problem is a conventional 
reasoning problem capable of being, at least conceptually, solved by classical logical tech¬ 
niques (i.e., the evidence implies the hypothesis.). 

In an approximate reasoning problem, however, the situation resembles that illustrated 
in Figure 2, where, in some of the possibilities that are consistent with the evidence are such 
that the hypothesis is true, while on others it is false. Being faced with such an inability to 
solve the problem of finding whether a hypothesis is true or false, all approximate reasoning 
methods, in one way or another, modify the problem to be solved concentrating instead in 
describing the evidential set in terms of its relationship with the hypothesis of interest. 

Probabilistic reasoning methods, illustrated in Figure 3. for example, seek to determine 
the proportion of evidential possibilities where a hypothesis is true (i.e.. the conditional 
probability of truth). This proportion is usually estimated with the aid of statistical tables 
that summarize experience under similar circumstances. 

Possibilistic reasoning methods, on the other hand, rely on measures of resemblance and 
similarity to determine, as illustrated in Figure 4. to what extent evidential possibilities 
resemble, or are close to. the set of possibilities where the hypothesis is true. The similarity 
measure that makes such a characterization possible is intended to be a measure of the 
extent by which facts that are true in one situation or scenario are true in another. For 
example, assessments of the stability of a weapon platform under some assumptions will 
remain approximately valid for similar platforms. 



Figure 1: The conventional reasoning problem. 


3 Possibilistic Reasoning 

Having in the past successfully utilized possible-world models to describe the conceptual 
bases of probabilistic reasoning and its generalizations, notably the Dempster-Shafer calculus 
of evidence, our attention during the reported research was primarily focused upon the formal 
characterization of possibilistic (i.e., “fuzzy logic”) methods according to the similarity-based 
model that is briefly described above. 

The major result of this research was a semantic model that was summarized in a number 
of publications and presentations[1.7.$.9.1U] and that is discussed in detail in a technical 
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note [1], soon to be published in tin* hU motional Journal of Approximate Reasoning. 

The major characteristics of this model are: 

• its ability to describe possibilistic techniques as the result of imposing metric structures 
upon a set of possible worlds rather than as the consequence of defining certain set 
measures (i.e.. probabilities) on that set. 

• its characterization of the metric properties of similarity or resemblance functions using 
operators previously considered only in the context of multivalued logics and the theory 
of probabilistic metric spaces (i.e.. triangular noivis), 

• its description of metric relations between pairs of possible states or scenarios using 
well-known topological concepts (i.e.. the Hausdorff distance), and the identification of 
relationships between such notions and the notions of unconditioned and conditional 
possibility distributions. 

• the validation of the gtntraii:id inodus pontns —the major inferential procedure of 
fuzzy logic—as a generalization of its classical counterpart, 

• its ability to provide cogent descriptions of approximate relations between system vari¬ 
ables. 

Ongoing research, to be reported in the immediate future, is currently concerned with 
the following issues: 

• Derivation of similarity measures hum possibility measures. The semantic model de¬ 
scribed above has dearly established that possibilistic logic procedures rely on notions 
of similarity between plausible states of the world rather than on measures of the 
relative likelihood of such possibilities. While this model was developed primarily to 
improve understanding of fundamental conceptual matters, the relations that were un¬ 
covered during such development have significant implications of a practical nature. Of 
particular importance is the potential ability to derive similarity measures - -the bases 
for such analogical processes as case-based reasoning—from possibility distributions 
the formal expression of important qualitative physical laws. We have developed initial 
formulations for the derivation of such similarity measures on the basis of a formal re¬ 
sult of L. Valverde on the representation of similarity measures. 

• The role of the notion of negation in possibilistic logic. Conventional modal logics are 
concerned with the qualification of the truth of propositions by describing such truth 
as being either necessary (i.e.. the unavoidable consequence of basic assumptions and 
the rules of logic), or contingent (i.e.. the consequence of assumptions applicable to the 
particular situation undei consideration). These considerations are the bases for the 
concepts of possibility and necessity, which related by a straightforward duality relation 
(based on the notion of negation) stating that something is possible if its negation is 
not necessary. 
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Our model of the semantics of fuzzy logic, while introducing graded (i.e.. relative) 
notions of possibility and necessity bas»nl on measures of similarity, did not relate such 
notions using the concept of negation. Identification of such a relationship, however, 
is of significant conceptual and practical importance as knowledge of what is possible 
under certain circumstances may yield important information as to what is necessary 
in other cases. 

Study of duality relations between pairs of subsets of possible worlds have led to the 
definition of new concepts of negation that are closely associated with the relations 
that exist between linguistic qualifiers that are antonyms of each other (e.g., (rich, 
poor] rather than (rich, not-rich]). 

• The study of the roles of system variables and concepts of independence in possibilis- 
tic logic. We have studied tin' relations between similarity functions defined from the 
joint viewpoint of several variables (e.g.. as when objects are differentiated using mul¬ 
tiple attributes such as color, volume, shape) and marginal similarities that take only 
into account certain subsets of variables (e.g., measures of resemblance based solely 
on color). We have derived initial formulations for the derivation of joint similarity 
measures from their marginals and viceversa. 

Furthermore, study of the relationships that hold between similarity measures defined 
from diverse viewpoints have led to the definition of possibilistie measures ~f inde¬ 
pendence (or interaction) between variables. The results, which will be reports. in a 
technical note that is currently under preparation, are of major practical importance 
to simplify complex processes of possibilistie inference (i.e.. providing a possibilistie 
counterpart to the probabilistic methods of network decomposition). We are currently 
investigating representation formulas to derive marginal similarity functions without 
having to resort to transitive extension (i.e.. chaining) of certain nontransitive rela¬ 
tions. Availability of such formulas will greatly improve the efficiency of inferential 
processes. 

• We are also investigating the eoueeptual reJali .is between the important decision- 
theoretic notions of iililily. rust, disiivbility. and prtfrnnct. The central idea, based 
on concepts proposed by Reseller (N. Reseller. "Semantic foundations for the Logic 
of Preference. 1 ' in N. Reseller, editor. Tht Logic of Decision and Action , Pittsburgh. 
1967), is that such notions may be logically formalized by measures that quantify 
our preference to be in certain states of the world rather than others. Preliminan 
results indicate that a utility-based model will provide an even broader formal basis 
for possibilistie logic, while relating such preference measures with the metric structures 
of our basic model. 

• We have developed a possibilistie formulation for the control of the navigation and for 
obstacle avoidance by autonomous vehicles that is being currently tested in the context 
provided by the SRI Autonomous Mobile Agent Platform. 






4 Probabilistic Reasoning 

We have also continued to investigate various issues of probabilistic reasoning, focusing upon 
questions of validity and generality of the Dempster-Shafer calculus of evidence. 

We have given special attention to the discussion of recent concerns, raised within the 
technical community, about the conceptual soundness of this approach. Our contribution 
to this exchange, intended primarily to clarify various confusions and misconceptions, was 
summarized in a paper presented at the Third International Conference on the Management 
of Imprecision and Uncertainty by Expert Systems [5], which is expanded upon in an un¬ 
published manuscript [6], currently under submission that is enclosed as part of this final 
report. 

We have also continued our previous research on generalized probabilistic methods em¬ 
phasizing the study of issues related to the treatment of conditional and dependent evidence. 
We have determined that . for reasonable definitions of conditional evidence distributions in 
the context of the DS calculus of evidence, these distributions are such that their combina¬ 
tion with unconditioned evidence usually results (even for simple examples) in probability 
bounds that cannot be expressed within the Dempster-Shafer framework. 

In connection, with these investigations we have derived a preliminary formulation of the 
problem of combination of conditional and unconditioned distributions as a linear program. 
In general, however, the solutions of such a problem will not obey the axioms of the calculus 
of evidence. Currently, we are focusing out attention upon three major questions: 

• the determination of ca-.es wlieir the irstih of evidential conditioning is a belief func¬ 
tion. 

• the approximation of roMilts not satislving evidential axioms by belief functions that 
do. 

• the development of a more general evidential calculus based on the notion of lower and 
upper probabilities. 
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Abstract 


This note presents a personal view of the state of the art in the representation 
and manipulation of imprecise and uncertain information by automated processing 
systems. To contrast their objectives and characteristics with the sound deductive 
procedures of classical logic, methodologies developed for that purpose are usually 
described as relying on Approximate Reasoning. 

Using a unified descriptive framework, we will argue that, far from being mere 
approximations of logically correct procedures, approximate reasoning methods are 
also sound techniques that describe the properties of a set of conceivable states of a 
real-world system. This framework, which is based on the logical notion of possible 
worlds, permits the description of the various approximate reasoning methods and 
techniques and simplifies their comparison. More importantly, our descriptive model 
facilitates the understanding of the fundamental conceptual characteristics of the 
major methodologies. 

We examine first the development of approximate reasoning methods from early 
advances to the present state of the art, commenting also on the technical motivation 
for the introduction of certain controversial approaches. 

Our unifying semantic model is then introduced to explain the formal concepts and 
structures of the major approximate reasoning methodologies: classical probability 
calculus, the Dempster-Shafer calculus of evidence, and fuzzy (possibilistic) logic. 
In particular, we discuss the basic conceptual differences between probabilistic and 
possibilistic approaches. 

Finally, we take a critical look at the controversy about the need and utility for 
diverse methodologies, and assess requirements for future research and development. 
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1 Introduction 


This note presents a personal view of the state of the art in approximate reasoning, 
the name used to describe several methodologies for the development of intelligent 
systems capable of manipulating imprecise and uncertain information. 

Approximate reasoning techniques loosely based on the calculus of probability 
appeared almost simultaneously with the development of expert systems relying on 
classical (i.e., two-valued) logic techniques. Soon after these systems were introduced, 
other approaches to the treatment of uncertainty and imprecision were also proposed, 
both to generalize more or less conventional probabilistic schemes and to capture other 
aspects of imperfect knowledge, claimed to have a nonprobabilistic nature. 

The short technological history of approximate reasoning methods may be de¬ 
scribed as being, from that moment, one of extreme controversy that has lasted to 
this day. Most of the proponents of classical probabilistic treatments, often described, 
although vaguely and somewhat misleadingly, as Bayesians, 1 have doubted the ne¬ 
cessity for the introduction of other conceptual structures and have often sought to 
explain those frameworks in terms of probabilistic notions. Proponents of alternative 
approaches, on the other hand, have defended their techniques on the strength of 
two main arguments: the practical problems associated with the parameter-intensive 
procedures of conventional probability, often demanding knowledge of a large number 
of probability values; and. the nonprobabilistic nature of the uncertainties associated 
with the use of vague concepts. 

Much of this disagreement has been clearly caused by misunderstandings about 
the fundamental philosophical characteristics of each approach. Lacking a suitable 
basis to interpret certain concepts, particularly those related to the "degrees of truth” 
of multivalued logics, it has been impossible, until recently, to provide an adequate 
framework to discuss fundamental issues in a rational manner. 

This position paper on the past evolution of the field, its present state of the art, 
and desiderata for future evolution is the result of recent research by the author in 
basic semantic issues that are germane to the foundation, of approximate reasoning. 
The presentation is based on the use of a central unifying framework: a formal model 
of the approximate reasoning problem that explains the similarities and differences 
between major methodologies. Using this "possible-worlds” model, we will also be 
able to compare the rationale of nonmonotonic logic approaches with that of approx- 

l The qualifier Bayesian is used in the context of statistics to describe proponents of a statistical 
methodology and in the context of the philosophy of probability to denote various subjective views 
of probability. In Artificial Intelligence, the term lias been loosely applied both to those investigating 
approaches based on the probability calculus and, more narrowly, to those espousing the decision- 
theoretic methods of subjective probability. 
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imate reasoning procedures. Although our model is a rigorous formalism, described 
in detail elsewhere [32,33] in connection with the logical foundations of the Dempster- 
Shafer calculus of evidence and fuzzy logic, our discussion will be kept as informal as 
possible to facilitate understanding our philosophical and technical position. 

We will contend that regarding probabilistic and possibilistic approaches as com¬ 
peting alternatives is incorrect and confuses the need to describe different aspects of 
reality with the adequacy or ability of probability as a measure of likelihood. We will 
also take a critical look at the major claims supporting a narrow view of probability, 
based on a subjectivist interpretation that regards all forms of rational decision¬ 
making as necessarily demanding optimization of expected-utility functionals, and 
we dispute claims that only such approaches are endowed with either a suitable or a 
proven decision-theoretical apparatus. 

On the basis of our theoretical arguments, and of recent success in the appli¬ 
cation of various techniques to practical problems, we will also argue that future 
accomplishment in the field lies in the rational development of tools leading to mul¬ 
tiple complementary views of the implications of evidence rather than on arbitrary 
circumscription to a limited class of techniques and procedures. 

2 The Development of Approximate Reasoning 

Intelligent systems relying on approximate reasoning techniques [8,39] apDeared in the 
1970s, approximately at the same time as other systems seeking to emulate the exper¬ 
tise of specialists in diverse fields of endeavor. Problems related to the development 
of the expert systems based on classical deductive procedures, however, were primar¬ 
ily related to the need to organize knowledge and its processing in such a manner 
as to assure an efficient derivation of the truth value of hypotheses (i.e., either true 
or false). Systems such as MYCIN or PROSPECTOR— reasoning about medical 
and geological systems, where knowledge is limited and where observations may be 
difficult or impossible to make—were forced to deal, in addition, with issues that, to 
this day, have almost completely consumed the attention of approximate reasoning 
researchers. 

These issues may be generally described as related to the extension of the basic 
derivation rule of classical logic, the modus ponens, which states that from the va¬ 
lidity of an antecedent proposition p and that of the implication p —* q, it is possible 
to derive the validity of the consequent proposition q. Although a conventional ex¬ 
pert system, using classical rules of derivation, could be assumed t,o have sufficient 
information to derive the validity of a hypothesis of interest, whenever knowledge 
was scarce or uncertain it was necessary to resort to other schemes that qualified 
in one way or another the meaning of the truth of propositions. Still imitating the 
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network-oriented techniques of truth-value propagation of two-valued logic, the ap¬ 
proximate reasoning schemes developed in early systems sought to propagate numeric 
truth values that were loosely related to probabilistic interpretations of uncertainty. 

The concept of probability provides a most important tool to describe the state of 
systems that are known under less than desirable informational circumstances. Aris¬ 
ing clearly from the need to make decisions despite undesirable knowledge handicaps, 
the notion of probability, seriously studied from the seventeenth century, has always 
played a major role in human judgment [16]. 

The appeal of probability as an instrument to assess system behavior is due to the 
empirically observed property that is expressed by the long-run stability of occurrence 
of certain events. Whether such a pattern of occurrence has been objectively quanti¬ 
fied through experimentation or historical observation (objective interpretation), or 
is subjectively expressed by the willingness to gamble with certain stakes (subjec¬ 
tive interpretation), it is clear that it provides a rational basis to formulate rational 
expectations about system state. Why would anybody, if such predictable stability 
of occurrence could not be assured, be willing to consciously bet on „ome outcomes 
rather than others if the real world defies any attempts to descriptive charac na¬ 
tion? 

Curiously enough, although probabilistic interpretations were always implicitly or 
explicitly intended by the developers of early approximate reasoning systems, and 
while the underlying calculi reflect such explanations, it seems also clear that the 
machinery of these devices was primarily oriented toward the emulation of the propa¬ 
gation schemes of classical logic with truth flowing from node to node through edges 
corresponding to implication rules. Approximate truth, measured by numbers asso¬ 
ciated with objective likelihood or expert confidence, also flowed from evidence to 
hypothesis in a scheme that generalized the true-false dichotomy of multivalued logic. 

Regardless of the clearly intended probabilistic interpretations of those numbers, 
misgivings about their meaning and utility were sufficient to plant the seeds of the 
ensuing controversy. Concerns about the inability of probability to capture notions of 
evidential confirmation led the developers of MYCIN[39], for example, to introduce 
modified concepts (“certainty factors") as an alternative to direct use of conditional 
probabilities. In spite of subsequent studies showing that such certainty factors were 
related to probability values [18], it is clear that these worries were well founded, 
having been already eloquently expressed in the works of philosophers of science [34]. 

Although such concerns are indeed important and, despite some claims to the 
contrary, must, still be properly addressed, other issues soon captured the attention 
of those seeking to develop expert systems with approximate reasoning capabilities. 
Beyond certain troublesome issues that were apparent when formulating the proba¬ 
bilistic calculi used by PROSPECTOR, arising from inconsistencies between “expert 
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estimates” of probability values and the laws of probability, it was also clear to those 
engaged in the development of new expert systems that a typical application required 
estimation of a very large number of individual probability values [14], which were 
neither available or derivable from existing data. 

In addition, other researchers, acquainted with the concepts and methods of mul¬ 
tivalued logic [31,13], advanced the notion that some of the “degrees of truth” being 
propagated could be interpreted in a nonprobabilistic fashion. The theory of fuzzy 
sets, introduced by Zadeh in 1965 [45], had been for some time the focus of attention 
of v these researchers and soon became a major source of techniques for the treatment 
of uncertainty by use of nonprobabilistic schemes. 

The variety of approximate reasoning methods arising from this diversity—expressed 
as a preference toward either a variedly interpreted, more or less strict application 
of classical probability schemes; as approaches seeking the expression of ignorance 
about probability values, such as the Dempster-Shafer calculus of evidence; and as 
nonprobabilistic schemes like fuzzy logic— have led to a controversy that has endured 
to this day. 

It has not been possible, until recently, to discuss these approaches with the help 
of a unifying framework that facilitates the interpretation of relevant concepts and the 
comparison of alternative methodologies. This unifying framework is based on a view 
of approximate reasoning problems as those wherein the truth-value of a hypothesis 
cannot be deduced from available information. 2 In other words, several scenarios, all 
consistent with evidence, may be conceived. In some of those stuatioris the hypothesis 
is true, while in others it is false. 

The logical notion that we will use to characterize such conceivable states of affairs, 
situations, or scenarios, is the concept of “ possible world” utilized by Carnap [4] 
in his logical treatment of the concept of probability, which was also employed by 
Nilsson [26] to derive a logic-based methodology for probabilistic reasoning. 

3 Possible-World Models 

A possible world may be briefly described as a function that assigns one and only one 
of the truth values true or false to every proposition (i.e., declarative statement) 
about the system that is being reasoned about. If we seek to describe and stm v 
the weathe r in Menlo Park, for example, the atmospheric conditions at several points 
in time are described by assigning specific values to meteorological variables such as 
temperature, humidity, and rainfall, or, equivalently, by assigning a truth value to 

2 Sometimes this characterization is extended to include those cases where that derivation is very 
difficult. 
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propositions such as 


The temperature at 3PM was 75°F. 

Since the value of system variables is unique (e.g., the temperature cannot be both 
75°F and 85°F at the same time), it is clear that each possible world (i.e., an assign¬ 
ment of truth values) must satisfy certain consistency conditions that follow from the 
axioms of classical logic. 

In approximate reasoning problems, however, we can usually do more to restrict 
the extent of the set of possible worlds that may conceivably describe the state of 
the system. Typically, the information or knowledge about the state of the system 
and its applicable rules of behavior, in spite of its deficiencies, is a major source of 
constraints that further limit the extent of the situations that must be considered. 
The subset of possible worlds that is logically consistent with this evidence is called 
the evidential set , and, in one form or another, is the concern of every approximate 
reasoning approach. In any approximate reasoning problem, by definition, some of 
these evidential worlds are such that a hypothesis is true in some of them and false 
on others, as depicted in Figure 1. 




1 

1 

| | Worlds consistent with the evidence j | 





Worlds logically inconsistent with ths evidence [] 



HYPOTHESIS TRUE 

HYPOTHESIS FALSE 




Figure 1: The approximate reasoning problem 

The view of approximate leasuuing problems that is afforded by this possibie- 
world perspective also simplifies the understanding of the objective of approximate 
reasoning approaches. Lacking, by the nature of the problem, the ability to determine 
if the evidence implies whether we are in a situation where a hypothesis is true or in 
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one where it is false, every approximate reasoning methodology seeks answers to a 
different problem: that of describing certain properties of the evidential set. 

4 The Semantics of Approximate Reasoning 

Our view of approximate reasoning methods as techniques to describe the evidential 
subset 3 e of possible worlds that are consistent with available information now allows 
a more detailed look into their philosophical bases. 

Probabilistic methods , regardless of their subjective or objective semantics, seek 
to estimate measures of the subsets of the evidential set where a hypothesis h is true 
and where it is false, i.e., the values 

p(h Ae) and p(->h Ae), 

or other related quantities, such as likelihood ratios or conditional measures with 
respect to the evidential set t. The measure p is, however, an aggregate measure of 
set extension based on the additive law 

M + M = Mp A q) + p{p V q ), 

stating that its value over a set may be derived from knowledge of its value over a 
partition of nonintersecting subsets. Regardless of the mechanism used to derive the 
weights associated with individual members of the subsets, it shouid be clear that 
interactions and associations between possible worlds (e.g., distances) do not play 
any role in such quantities. Simply stated, all that matter are the weights of each 
individual point (more generally, each atomic subset) that are then added to gauge 
the extent of the subset. 

Possibilistic methods, on the other hand, are based on notions of proximity and 
resemblance between pairs of possible worlds. This association or similarity is also a 
measure, albeit not one that may be expressed in terms of individual weights. Ex¬ 
ploiting the idea that, in many systems, statements that are true in certain situations 
remain approximately true in similar instances (e.g., clothing that is appropriate when 
the temperature is 75°F will work nearly as well at 78°F), the purpose of possibilistic 
techniques is to describe the evidential set in terms of the similarity of its component 
possible worlds to other possible worlds used as reference landmarks. 

The basic difference between probabilistic and possibilistic methods, therefore, 
goes beyond the use of different formulas to derive truth values. The methodologies 
are based on different conceptual approaches to the description of the evidential set; 

3 For simplicity, we refer loosely to sets and propositions are if they were the same objects. 
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I Ivy stress, In probabilistic reasoning, relative measures of set size, such as the ratio of 
previously observed true and false cases, while, in possibilistic reasoning, they stress 
binary measures of similarity that describe how far is any conceivable scenario from 
certain significant situations. 

In both approaches, however, the objective is the description of properties of the 
evidential set rather than of any of its particular members. By contrast, certain 
nonmotonic logic techniques such as circumcription [24] rely on methods to choose 
least-exceptional worlds in the evidential set by extension of the “close-world as¬ 
sumption” [30], i.e., the only propositions or predicates that are true are those that 
are known to be true. These techniques may be considered general procedures to 
represent states of evidential knowledge by choice of prototypical situations. New 
evidence, however, may force retraction of some of the assumptions leading to the 
selection of other evidential worlds as prototypes. Another class of nonmonotonic 
reasoning techniques, while generally fitting the description given above, relies on 
prespecified “default” rules [29] to control the choice of prototypical worlds. Since 
these rules are usually formulated on the basis of plausibility notions rooted on sta¬ 
tistical information (as in the famous example of Tweety and the flying ability of 
most live birds) it is not surprising that the derivation techniques and rules of these 
preferential logics —a name indicating their definition of a preferred order for models 
of a situation—resemble those of probabilistic reasoning. In fact, recent developments 
strongly point to the existence of a common unifying interpretation for both [28,15]. 

4.1 Probabilistic Reasoning 

There can be little argument from any quarter that frequencies of occurrence of events 
satisfy the famous additive law that is axiomatized in the definition of set measure [17]. 
If propositions that describe event occurrence can only be assigned one and only one 
of the classical probability values, then it is obvious that whenever such repetitive 
occurrences are counted, then the sum of positive and negative occurrences must add 
up to the total number of relevant cases. As far as this objectivist interpretation 
of probability is concerned, therefore, there is little doubt that classical formalisms 
provide a suitable conceptual tool to capture the behavior of systems that expresses 
itself, as experimentally observed, ir. the form of stable frequency values. 

Probabilities, viewed from the perspective of our possible-worlds model, may be 
considered as the basis of methods providing answers to a question that is related to 
but different from the undecidable issue of the validity of a hypothesis. Unable to 
state, because of lack of information, that h is either true or false, we describe instead 
the behavior of the system in the long run, by calculating the frequency of occurrence 
under similar circumstances. 
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Probabilistic reasoning schemes may be generally described as concerned with the 
computation of the joint probability distribution of several system variables, based 
on knowledge of the values of related marginal and conditional probability distribu¬ 
tions. Whenever the required values are available it is possible, cone ;ptually at least, 
to derive the required joint distributions. In fact, it may be fairly stated that, once 
it was understood that such derivation should be the goal of probabilistic reasoning 
systems, the attention of proponents of that methodological perspective has been al¬ 
most completely directed toward the development of methods to simplify the required 
knowledge organization and manipulation [27]. 

Substantial concerns arise, however, regarding what must be done when the needed 
probability values are not known. In applied science, when unknown systems and phe¬ 
nomena are investigated, experiments are designed and performed to determine the 
basic laws of system behavior, which are typically expressed through quantitative 
relationships. If, based on such knowledge, rational courses of action are chosen, the 
careful scientist is then able to explain and justify his decisions on the basis of a strong 
epistemological apparatus supported both by empirical observation and by rational 
deduction. This scheme, which proceeds from information acquisition to decision 
making, embodies the experimental method of modern science. From such a per¬ 
spective, probabilistic laws describe certain aspects of system behavior described by 
parameters that are estimated using the same methods that are universally accepted 
and employed in applied science. 

Another view of probability, however, regards probability values as expressions of 
the degree of belief of rational decision makers regarding the validity of hypotheses. 
This degree of belief is quantified by the amount of money that a rational gambler 
is willing to bet in a gamble where the payoff, if the unknown truth value turns out 
to be true, is $1. The probabilistic behavior of these degrees of belief is justified 
by a number of axiomatic systems [6,35] providing formal support not only to this 
subjectivist interpretation of probability but also to a decision-making methodology 
based on the maximization of expected utility. Related axiomatic formulations have 
been also developed to suppo-t the contention that the only correct procedure for 
updating such beliefs is the Bayes-Laplace rule [5]: 


Prob(g|p) = 


Prob(p|<y) Prob(g) 
Prob(p) 


A number of researchers have questioned, in the past, the purportedly rational 
nature of these axiomatic systems. Their misgivings, which we share, arise both from 
questions about the rationality of some specific axioms, as noted by Suppes[42], and 
from observation of the behavior of rational decision-makers(including developers of 
the axiomatic formalisms) that contradicts the sure-thing principle, as observed by 
by Allais [1] and Ellsberg [11]. Kyburg[21] has also raised substantial concerns about 
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the epistemological status and soundness of the subjectivist approach. The axiomatic 
system of Cox has also been criticized for its assumption that beliefs are measured by a 
single number [10] and, again, for the less-than-natural character of some axioms [38], 

Proponents of this stringent orthodoxy have often argued that behavior departing 
from their theoretical requirements, however prevalent, is actually irrational. Such a 
claim, however, suffers from a fundamental methodological flaw. Rationality should 
be defined in terms of basic requirements that demand proper consideration of two 
fundamental factors: observed empirical evidence and the laws of logic. By requiring 
compliance with certain basic tenets of rational behavior, such as the famous avoid¬ 
ance of “dutch books,” subjectivist schemes certainly attempt to meet one of these 
requirements, albeit in a limited fashion, as pointed out by Kyburg[21]. By defining 
rational behavior as that which results from utilization of the proponent’s favorite 
scheme, the characterization of rationality is subjected to a curious argument that 
inverts the identity of what is rational with what must be done to ensure rational 
behavior. This inversion effectively ensures that the expected utility approach would 
always be considered to be rational: in fact, if any other behavior is observed, it 
would be, by definition, irrational. 

This inversion of premises and conclusions is also apparent in other arguments, 
based on pragmatic necessity considerations, for the superiority of the subjectivist 
approach. If decisions, even those to obtain more information, must be made, then 
the elements required to make the decision (i.e.. utility functions and degrees of 
belief) must be assessed. Conversely, any decision implies that such values have been, 
whether knowingly or not, chosen in some form or fashion. As a result of this close 
relation between the assessment of situations and the selection of suitable courses of 
action, guaranteed by the fact that values of expected utilities (i.e., numbers) may 
always be totally ordered, it is claimed that the subjectivist approach is the only 
one among approximate reasoning methods that has a rational decision-theoretic 
apparatus. 

As appealing as such claims may be to some decision-makers, we must note again 
a curious exchange of roles in the scientific discovery process: decisions no longer 
follow from empirical observation and rational cogitation: rather, parameters that 
describe knowledge follow from a practical need to choose suitable actions. However 
pressing may be the need to derive decisions it should be clear that, in the absence of 
information, it is usually impossible to determine what is the best course of action. 
Any randomizing device would, under such circumstances, provide a total ordering of 
possible choices but there is very little to assure us that any behavior based on such 
arbitrary basis ought to be called rational. 

The ultimate goal of an intelligent system is to take actions based on knowledge 
about the actual rather than the believed behavior of a real world system. It is 
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difficult to see why, as noted by Kyburg (22], the latter should be given much attention 
outside psychological research. If applied science is, as generally admitted, a rational 
enterprise that seeks to uncover the secrets of the universe and to provide guidelines 
to take actions based on such knowledge, then it is clearly desirable that intelligent 
agents, in their quest for similar objectives, follow as closely as possible the essential 
procedures of the scientific method. The ability to produce decisions regardless of 
the extent and pertinence of available knowledge should be regarded as a handicap 
rather than as an advantage of a procedure: a fact readily noticed by those engaged in 
the solution of important real life problems [12]. As we pointed out before, whenever 
such knowledge is acquired, it is typically reported using a format that emphasizes 
the quality of the observational method and the strength of the arguments leading 
from empirical data to the author’s conclusions rather than on the basis of personal 
confidence expressed by willingness to take gambling risks. 

I have made a rather long exposition about the dichotomy between subjectivist 
and objectivist approaches to probability primarily because I believe this to be a 
major cause of a controversy that, beyond considerations that are solely germane to 
probabilistic reasoning, extends to the need for techniques that are not directly based 
on subjectivist orthodoxy. I have also been motivated by the desire to clearly expose 
a personal position that is shared by many in the approximate reasoning community 
but that is also often misleadingly described as being antiprobabilistic. 

Far from being antagonistic to one approach for the simple sake of promoting oth¬ 
ers, my eclectic view is the direct result of practical experience with the development 
of models of complex systems, and of close familiarity with the application of math¬ 
ematics to technological problems. Probability is indeed a powerful tool to describe 
chance-related aspects of the behavior of real-world systems. Recent contributions of 
probabilists and decision scientists, within and without the context of Al, such as the 
development of network-oriented procedures for probabilistic reasoning [27]. are most 
important additions to our methodological arsenal. 

There are, however, limitations on the capabilities of any tool, whether for system 
analysis or for any other purpose. As is true of any tool, including all methodolo¬ 
gies described in this note, the applicability of probability is limited by its inability 
to perform functions that lie outside its scope, and by practical constraints on our 
ability to use it in specific situations. In spite of its unquestionable utility, other ap¬ 
proaches also play a significant role in the description of the possible state of affairs. 
These techniques must not be considered to be competitors of probability but, rather, 
complementary techniques to enhance the understanding of the real world. 
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4.2 Generalized Probabilistic Reasoning 

Those who worry about the potential lack of applicability of techniques based on con¬ 
ventional probability formalisms do not question the conceptual validity of probability 
as the appropriate tool to measure the frequency of occurrence of diverse events under 
various conditions or, in some cases, the strength of belief of decision-makers. Con¬ 
cerns about the problems caused by ignorance of probability values, however, have 
been expressed continuously since the nineteenth century by such prominent logi¬ 
cians as George Boole [3], and have led to the development of approaches to represent 
probabilistic ignorance by using subsets of possible probability values. 

If, for example, the probability of validity of a proposition p is unknown, an 
interval probability method will represent such ignorance by assigning the interval 
[0,1] as the value of the missing probability. If it is known, on the other hand, that 
an event has better than even chances of occurring, such knowledge will be represented 
by the [0.5, l] interval. More generally, probabilistic knowledge may be represented 
as a set of possible probability values in a hyperdimensional cube, as in the convex 
probabilities approach of Kyburg[20]. 

The corresponding probabilistic calculi are straightforward conceptual extensions 
of the classic, number based calculus. Such extensions produce, for example, inter¬ 
vals of expected utility values on the basis of knowledge expressed as set of possible 
probability values. These intervals may be used, in many instances, to rank decisions 
in the same way that such choices are ordered with number-based schemes. When 
this ordering is not possible (e.g.. overlapping intervals show that under certain sce- 
nav o A is preferrable to B, while, in other situations, B is to be preferred), the lack 
of a clear choice does not imply that the decision-theoretic apparatus is defective. 
Rather, the methodology is rich enough to tell us precisely how far empirical knowl¬ 
edge, combined with the laws of rational thought, can take us. If. beyond that point, 
it is imperative to do something—a rather unfortunate set of events—any selection 
scheme, from that point on, will be as rational as any other (i.e., very little). 

Although the manipulation of intervals and sets of possible probability values al¬ 
leviates some conceptual worries, it hardly helps in terms of the ability to perform 
the required computations. The situation, unfortunately, is made worse by the need 
to represent and manipulate probability bounds for subsets without the simplifying 
help that additivity provides for actual probability values. This unfortunate state 
of affairs is the primary reason for the popularity that an approach—capable of be¬ 
ing interpreted in terms of interval probabilities— enjoys today as one of the major 
methodologies of approximate reasoning. This approach is the Dempster-Shafer cal¬ 
culus of evidence. 

Originally developed by Dempster [7] in the context of statistical studies, the ap- 
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proach was further developed by Shafer [36] as a non-Bayesian alternative to the 
representation and manipulation of degrees of belief. Recently [32], application of 
possible-world semantic models to the interpretation of its major structures has shown 
that the approach is fully consistent with the classical calculus of probability, includ¬ 
ing the Bayes-Laplace formula. Smets [40] has also recently reviewed the structures 
of the calculus of evidence proposing, in addition, unconventional extensions based 
on a uonprobabilistic concept of belief. 

The calculus of evidence may be readily understood using our basic model if it is 
recalled that, whenever assessing the validity of a hypothesis on the basis of emprical 
knowledge, there are three possible logical outcomes of any reasoning process: the 
hypothesis may be proved to be true, the hypothesis may be proved to be false, or 
the information may be insufficient to make either of those conclusions. 

If the notation Kp is used to denote the set of situations, i.e., possible worlds, 
where p can be proved true, if K ->p correspondingly denotes those cases where it 
can be proved false, and if Ip denotes the set of situations where the truth value of 
p cannot be established without ambiguity, then it is obvious that any probability 
function Prob(-) will satisfy the equation 

Prob(Kp) + Prob(K-ip) + Prob(Ip) = 1. 

Furthermore, since the probability of Ip may be positive, it will be true, in general, 
that 

Prob(Kp) -f Prob(K-ip) <1. 

The calculus of evidence is based on the representation of the probabilistic in¬ 
formation conveyed by evidence by means of belief functions. These functions may 
be readily interpreted in terms of the above probabilities of provability through the 
equation 

Bel(p) = Prob(Kp). 

More importantly, these belief functions are usually expressible in a compact form by 
means of basic probability assignments or mass functions. These functions m, which 
are also defined over propositions, are related to belief functions by the equation 

Bel(p) = ]T m(q). 

q=>p 

The ability to represent and manipulate probability intervals by means of mass func¬ 
tions is the major reason for the appeal of the Dempster-Shafer methodology. 

Although, in a typical decision problem, we are interested in the truth of p rather 
than its provability, lack of adequate information precludes determination of the prob¬ 
ability of such truth. In general, however, it may be said that 

Bel(p) < Prob(p) < 1 - Bel(-<p). 
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Furthermore, these bounds cannot be improved. 

This interpretation of the Dempster-Shafer calculus as concerned with probabili¬ 
ties of provability, as called by Pearl [27], was first formalized by the author using a 
possible-worlds model based on the use of a modal logic called epistemic logic. The 
formal system, which is equivalent to the modal system S5 [19] used by Moore [25] in 
his pioneer work on the application of modal logic concepts to artificial intelligence 
problems, is enhanced by consideration of probability distributions over the set of 
possible worlds. In particular, the unary operator K represents the knowledge of a 
rational agent to prove that a proposition may be known or proved to be true. 

The probability of the set of all possible worlds where a proposition p is the most 
specific proposition that is known to be true, called the epistemic set, corresponds to 
the values of the mass function. In any possible world, this most specific knowledge is 
the conjunction of all propositions that are known to be true in that possible world. 

The semantic model of the Dempster-Shafer theory also validates the so-called 
Dempster’s rule of combination, which permits the combination of belief and mass 
functions corresponding to evidential observations made under certain conditions of 
independence. When such conditions are not valid, use of this formula leads, of 
course, to erroneous results, often, although incorrectly, considered to be an essen¬ 
tial handicap of the evidential reasoning approach, rather than a consequence of its 
misapplication. 

From our perspective the only substantial example of such misapplication is that 
which results from improper use of the Dempster’s rule of conditioning, i.e., a par¬ 
ticular use of the rule of combination that is valid only under special circumstances, 
as a substitute for Bayes' rule. Certain methodological limitations of the calculus of 
evidence, notably the lack of methods to handle with sufficient generality the coun¬ 
terparts of conventional conditional probabilities, are more worrisome, in our opinion, 
than any distress arising from its misuse or its supposed lack of a decision-making 
apparatus. 


4.3 Possibilistic Reasoning 

Our basic semantic model also provides straightforward interpretations [33] for the 
major concepts and structures of possibility theory [46,9]: an approach to approxi¬ 
mate reasoning derived from multivalued logics [31] and the theory of fuzzy sets [45]. 
The major formal tool that enhances our understanding of such structures is not a 
probabilistic measure of set size but, rather, a binary measure ol proximity or dis¬ 
tance, called a similarity relation. 

Similarity considerations play a major role in human cognitive processes [44]. In- 


13 









formally, all such analogical processes are based on the notion that the validity of 
some propositions in a given situation extends also to other situations where the 
same basic conditions are prevalent. 

In our model of possibilistic structures, the similarity between states of affairs is 
expressed by a function that assigns a number between 0 and 1 to every pair of possible 
worlds. The value of that function S(w, w‘) for a pair of possible worlds quantifies the 
extent of resemblance between pairs of situations or scenarios, as evaluated from the 
viewpoint of the particular problem being considered. In a decision-making problem, 
for example, the decision maker may define such measures to describe the extent by 
which the consequences of certain decisions resemble desirable goals or objectives. 

The highest similarity value, 1, indicates that, from the perspective of the system 
being studied, both situations are indistinguishable. The lowest value, 0, indicates 
that knowlege of what is true in one possible world does not help to derive what is 
true in the other. 

Similarity scales are the measurement sticks used to describe the extent by which 
certain results may be extrapolated from one possible world to another. Unlike proba¬ 
bility functions, which correspond to either measurable properties of physical systems 
or states of belief of rational agents, the similarity relations simply provide a mecha¬ 
nism to describe resemblance between states of affairs. 

Similarity relations may also be regarded as generalizations of the modal-logic 
notion of accessibility or conceivabilit v [19] by introduction of multiple binary relations 
R a between possible worlds (one for each value of a between 0 and 1), defined by 

R 0 {w,w') if and only if S(w, w') > a. 

These relations also justify the use of a possibilistic terminology that regards proposi¬ 
tions as being possible to some degree, thereby generalizing the classical definition of 
the modal operator for possible truth in a manner similar to that used by Lewis [23] 
in his treatment of counterfactual statements. 

Certain requirements must be imposed to assure that similarity functions truly 
represent notions of resemblance between possible situations. Similarities between 
identical scenarios, for example, should have a value of 1, the highest possible value. 
Furthermore, if two different possible worlds are to be distinguished by means of 
similarity values, then it also makes sense to require that their similarity be strictly 
less than 1. It is likewise natural to require that the similarity between two particular 
scenarios be a symmetric function, i.e., w resembles w' as much as w' resembles w. 

Beyond these properties of reflexivity and symmetry, it is also necessary to require 
that similarities satisfy a generalized form of transivity. If, given three possible worlds 
w, w' and w ", the worlds w and w' are highly similar while w' and w" are also highly 
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similar, it will be unreasonable to say that w and w" may be highly dissimilar. The 
value of S(iv,w") must, therefore, be bounded by below by a function of S(w, w') and 
S{w',w"), as expressed by the condition 

S(w, w") > S{ui , w') © S{w', w "), 

which uses the binary operation © to denote the required function. 

If certain reasonable requirements are imposed upon the function ©, it is easy 
to see that this function has the properties of triangular norms , which are usually 
introduced in multivalued logics [43] to relate the truth value of a conjunction p A q 
to the degrees of truth of p and q. These functions are motivated, in our model, by 
considerations that are related solely to metric concepts of proximity and resemblance. 
Important examples of triangular norms are given by the functions 

a ©6 = min(a, 6), a ©6 = max(a + b — 1.0), and a©fe=a6, 

called the Zadeh, Lukasiewicz, and product triangular norms, respectively. 

Similarity functions are trivially related by the relation 

<5 = 1-5, 

to functions <5 that have the properties of a distance or metric function. In the 
particular case where © is the triangular norm of Lukasiewicz, then 6 is an ordinary 
metric or distance, which obeys the well-known triangular inequality 

6(u\u>") < 6(u>,w') + 6[w',w"). 

If © is the Zadeh triangular norm, on the other hand, the transitivity property is 
equivalent to the stronger ultrametric inequality 

<5(u>, w") < max (<5(u>, w'), S(w', w")). 

The structures introduced by similarity relations may be readily applied to gen¬ 
eralize the subset inclusion relations that are the fundamental basis of deductive 
reasoning. These inclusion relations are typically expressed by conditional proposi¬ 
tions of the form “ If q , then p." stating that any state of affairs where q is true is such 
that p is also true. These conditional propositions, which permit the derivation of 
true propositions from knowledge of the truth of others by means of the rule of modus 
ponens, may be also stated using similarity structures by saying that any q -world has 
a p-worid (j.e., itself) that is as similar as possible to it. 

The ability to characterize proximity between possible worlds using a continuous 
scale of similarity provides for a more general characterization of the inclusion rela¬ 
tions that hold between subsets of possible worlds (i.e., propositions). If the subset 
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of 9 -worlds is not included in that of p-worlds, we may, however, use the similarity 
structure to quantify the amount of stretching required to reach a p-world from any 
9 -world. The degree of implication function defined by the expression 

I(p| 9 ) = inf sup S('a>,u/), 

to'\-q whp 

which is related to the well-known Hausdorff distance, provides such quantification 
as the size of the topological neighborhood of p that encloses 9 , as shown in Figure 2. 



The ability to express relationships between neighborhoods of different sets of 
possible worlds or, equivalently, between propositions permits the generalization of 
the modus ponens by use of the transitive property of the degree of implication 
function: 

I(p\ r ) > l(p\<l)‘* I(q\r), 

illustrated in Figure 3. 



The generalized modus ponens rule of Zadeh [46] is expressed by means of pos¬ 
sibility distributions, which are themselves defined in terms of similarities between 
evidential worlds and those satisfying a given proposition p [33]. From the viewpoint 
of our similarity-based model, the generalized modus ponens may be thought of as 


16 









a sound rule of logical extrapolation that exploits similarities between conceivable 
scenarios or situations. The fundamental topological structures that permit this type 
of reasoning are clearly different in character and nature than the measures of set 
extension that are the conceptual basis of probabilistic reasoning. 

In closing, it is important to mention that posibilistic reasoning based on fuzzy 
logic has led recently to the implementation of a large number of successful commercial 
products [41]. These systems, which have primarily exploited the applicability of the 
technology to a variety of control devices, provide a clear indication of the usefulness 
of these ideas, which now also rest on clearly understandable theoretical foundations. 

5 Looking ahead 

The ability to explain the role and utility of the major approximate reasoning ap¬ 
proaches by use of a unifying framework provides the rational basis to resolve most 
of the issues about relative importance and necessity. Rather than supporting any 
partisan contention about the superiority of one methodology over the others, this 
framework shows instead that a variety of tools are needed to produce effective de¬ 
scriptions of evidence and its implications. 

Each methodology may play a significant role in every potential application of 
approximate reasoning techniques: a role that complements rather than substitutes 
forf other procedures. In the absence of compelling theoretical arguments for rejecting 
any approximate reasoning position and in the presence of substantial solid evidence 
of their usefulness and applicability, it is irrational to maintain positions that are 
needlesly divisive and polemic. 

Recent investigations showing that there exist substantial functional rather than 
conceptual similarities between the network-oriented methods of conventional prob¬ 
abilistic schemes and the calculus of evidence [37], and indicating that fuzzy-set con¬ 
cepts and multivalued logic may be successfully blended to represent vague knowledge 
about probabilities [2]. clearly point the way toward a more productive research col¬ 
laboration between approximate reasoning specialists. 

This collaboration should stress application of all valid concepts to the solution of 
practical problems rather than further continuation of the controversy about techno¬ 
logical superiority or necessity.. In particular, the example set by Japanese researchers 
in the development of a large number of commercial products of evident applicability 
illuminates the path that must be followed. The future lies in the solution of practi¬ 
cal problems, both because of the direct importance of those problems, and because 
conceptual developments and clarifications usually follow, as is the case of the work 
discussed in this note, from the experiences gained producing such solutions. Having 
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established needed conceptual bases to clarify controversial issues, we hope it is clear 
that this is the time to apply ideas rather than to continue to argue about them. 
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Abstract 


This note presents a formal semantic character: :ation of the major concepts and constructs of 
fuzzy logic in terms of notions of distance, closeness, and similarity between pairs of possible worlds. 
The formalism is a direct extension (by recognition of multiple degrees of accessibility, conceivability, 
or reachability) of the major modal logic concepts of possible and necessary truth. 

Given a function that maps pairs of possible worlds into a number between 0 and 1, generalizing 
the conventional concept of an equivalence relation, the major constructs of fuzzy logic (i.e., condi¬ 
tioned and unconditional possibility distributions) are defined in terms of this generalized similarity 
relation using familiar concepts from the mathematical theory of metric spaces. This interpretation 
is different in nature and character from the typical, chance-oriented, meanings associated with prob¬ 
abilistic concepts, which are grounded on the mathematical notion of set measure. The similarity 
structure defines a topological notion of continuity in the space of possible worlds (and in that of its 
subsets, i.e., propositions) that allows a form of logical “extrapolation” between possible worlds. 

This logical extrapolation operation corresponds to the major deductive rule of fuzzy logic 
—the compositional rule of inference or generalized modus ponens of Zadeh—an inferential opera¬ 
tion that generalizes its classical counterpart by virtue of its ability to be utilized when propositions 
representing available evidence only match approximately the antecedents of conditional proposi¬ 
tions. The relations between the similarity-based interpretation of the role of conditional possibility 
distributions and the approximate inferential procedures of Baldwin are also discussed. 

A straightforward extension of the theory to the case where the similarity scale is symbolic 
rather than numeric is described. The problem of generating similarity functions from a given set of 
possibility distributions, with the latter interpreted as defining a number of (graded) discernibility 
relations and the former as the result of combining them into a joint measure of distinguishability 
between possible worlds, is briefly discussed. 
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1 INTRODUCTION 


This note presents a semantic characterization of the major concepts and constructs of fuzzy logic 
in terms of notions of similarity, closeness, and proximity between possible states of a system that 
is being reasoned about. Informally, a “possible state” (to be formalized later using the notion of 
“possible world”) is an assignment of a well-defined truth-value (i.e., either true or false) to all 
relevant declarative knowledge statements about that system. 

The primary goal that guided the research leading to the results presented in this work has been 
one of conceptual clarification. A great deal of energy has been directed in past few years to debating 
the methodological necessity and relative merits of various approximate reasoning methodologies. As 
a result of these exchanges, the need to consider certain nonclassical approaches, has been questioned 
on a variety of bases. 

Recognizing the need for the development of sound semantic formalisms that shed light on the 
nature of different approaches, the author has pursued, in the past few years, a line of theoretical 
research seeking to describe various approximate reasoning methodologies using a common frame¬ 
work. These investigations have recently shown the close connection between the Dempster-Shafer 
calculus of evidence [35] and epistemic logics. This relationship was elucidated by straightforward 
application of conventional probabilistic concepts to models of knowledge-states that distinguish 
between the truth of a proposition and knowledge (by rational agents) of that truth. Central to 
this development is the notion of “possible world” used by Carnap [6] to develop logical bases for 
probability theory. 

The same central notion of possible state of affairs is also the conceptual basis of the results 
presented in this note, which is aimed at establishing the semantic bases of possibilistic logic with 
emphasis on the study of its possible relations and differences, if any, with probabilistic reasoning. 

The results of this investigation clearly show that possibilistic logic can be interpreted in terms 
of nonprobabilistic concepts that are related to the notions of continuity and proximity. The major 
functional structures of fuzzy logic, i.e., possibility and necessity distributions, 1 may be defined in 
terms of the more primitive notion of similarity between possible states of a system using constructs 
that are the direct extension of well-known concepts in the theory of metric spaces. The topological 
metric structure that is so defined may be used to derive a sound inferential rule that is a form 
of logical “extrapolation.” This rule is also shown to be the compositional rule of inference or 
generalized modus ponenB proposed by Zadeh [53]. Conversely, possibility distributions—expressing 
resemblance from some specific regard—may be used to derive the actual similarity functions— 
discerning between possible worlds from the joint viewpoint of several respects. 

The constructs that are used to derive the interpretation presented in this note are formally, 
structurally, mid conceptually different from those that explain probabilistic reasoning, in either 
its objective or subjective interpretations, irrespective of methodological reliance on interval-based 
approaches to represent ignorance. The latter class of methods—measuring the relative proportion 

1 It is important to remark that the icope of this work U limited to the moat fundamental concepts and construct* 
of fuzzy lope without examining related notions such as, for example, generalized quantifiers. 
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of (either observed or believed) occurrence of some event—are based on the mathematical notion of 
set measure, while the former—seeking to establish similarities between situations that may be used 
for analogical reasoning—are related to the theory of distances and metric spaces. 

This presentation of the relationships between similarity-based concepts and possibilistic notions, 
while grounded on a formal treatment that is based on rigorous logical and mathematical formalisms, 
will be kept at a level that is as informal as possible. The purpose of this presentation style is 
to facilitate comprehension of major ideas without the clutter that would need to be otherwise 
introduced to keep matters strictly precise. For this reason, we will refrain from formal introduction 
of structures and axiom schemata, that, although correct and proper, may encumber understanding 
of the basic concepts. 

Before we proceed to the detailed consideration of semantic models, I must briefly remark on 
the epistemological implication of these developments. The present interpretation is not claimed 
to be the only one that may be advanced to define the notion of possibility in terms of simpler 
concepts, nor do I claim that it may not be sometimes possible, even desirable, to model possibilistic 
structures from other bases. My intent is not to prove the conceptual superiority of one approach 
over another or to argue about the relative utility of different technologies. Rather, I hope that these 
results have contributed to establish the basic conceptual differences to the treatment of imprecise 
and uncertain information that are inherent in probabilistic and possibilistic methods; the former 
oriented toward quantifying believed or measured frequency of occurrence, and the latter seeking to 
determine propositions—implied by the evidence—that are similar, in some sense, to a hypothesis 
of interest. In other words, beyond accidental domain-specific relations, both types of methods are 
needed to analyse and clarify the significance of imprecise and uncertain information. 
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2 APPROXIMATE REASONING AND POSSIBLE WORLDS 


Our point of departure is the model-theoretic formalisms of modal logics. Let us assume that 
declarative statements about the state, situation, or behavior of a real-world system under study 
are symbolically represented by the letters of some alphabet 

which are combined in the customary way using the logical operators V,A, —♦ and *-* (to be 

interpreted with their usual meanings) to derive a language if (i.e., a collection of sentences). 
Furthermore, we augment this language by use of two unary operators N and II, called the ne¬ 
cessity and possibility operators, respectively, having usage governed by the rule 
If 0 is a sentence, then N0 and 110 are also sentences, 
introducing the ability to represent different modalities for the truth of propositions. 

A model for this propositional system is a structure consisting of three components: 

1. A nonempty set of possible worlds U introduced to represent states, situations, or behaviors 
of the system being modeled by our sentences. In what follows we will refer to this set as the 
universe of discourse, or universe, for short. 

We will also need to consider a nonempty subset if of the universe U , which is introduced 
to model the set of conceivable worlds that are consistent with observed evidence. This set 
(possibly equal to the whole universe U) will be called the evidential Bet. Throughout this 
note, we will assume that evidence about the world is always given by means of conventional 
propositions that allow to determine, without ambiguity, whether a possible world either is or 
is not a member of the evidential set. 2 

2. A function (called a valuation ) that assigns one and only one of the truth values true or false 
to every possible world w in the universe U and every sentence 0 in the language. Assignment 
of the truth-value true to a pair (u>,0) will be denoted tt/h0 (i.e., 0 is true in the world w). 

In what follows, we will use the same symbols to describe subsets of possible worlds and the 
propositions that are true only in worlds that are members of such subsets. For example, the 
symbol if will be used to denote both the evidential set and the proposition that asserts the 
validity of the corresponding evidential observations. Using this notation, for example, we 
will write u>H if to indicate that the world w is compatible (i.e., logically consistent) with the 
evidence if. 

Furthermore, we will use the symbol if, introduced above as a set of well-formed sentences, 
to denote also the power set of the universe U. Rigorously, subsets of it strictly correspond 
to the classes of equivalence of the sentence set if that are obtained by equating logically 
equivalent sentences. In the same simplifying vein, we will drop also the customary distinction 

.— . ■■ - - I ■ 

2 For the sake of simplicity, fussy evidential fact* *uch a* “Tom it rich,” usually considered in fussy lope, will not 
be treated in this note. The meaning of such assertions will be discussed in a forthcoming paper. 
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between sentences—the linguistic expressions of something that may be true or false—and 
propositions—the actual things being asserted. 

3. A binary relation R, between possible worlds, called the accessibility, conceivability, or reach¬ 
ability relation, its traduced to model the semantic of the modal operators N and II. 

It is not necessary to review here the well-known axioms [21] that restrict the assignment of 
truth values to well-formed sentences according to the rules oi propositional logic. To facilitate 
comprehension of our formalism, we need to recall solely the rules that constrain assignment of 
truth values to sentences formed by prefixing other valid expressions with the modal operators, i.e., 

1. The sentence 4 U necessarily true in the possible world w (i.e., wHN^) if and only if it is true 
in every world w' that is related to the world w by the relation R. 

2. The sentence <f> is possibly true in the possible world w (i.e., tvhll^) if and only if it is true 
in some world w' that is related to the world w by the relation R. 

If, for example, the r> -\tion R relates worlds that share the same (possibly empty) subset of true 
sentences of the prespecified set of expressions 

i.e., R(w,w') if and only if any sentence <f> in ^ is either true in both w and uf or it is false in both 
w and w', then the resulting system has an “epistemic” interpretation that regards related possible 
worlds as “being possible for all we know” (i.e., observed evidence, corresponding to a subset of 
ST is the same for both worlds). In this case, the necessity operator N corresponds the epistemic 
operator K of epistemic logics, with the corresponding system having the properties of the modal 
system S5, which was used—in the context of probability theory—as the semantic basis for the 
Dempster-Shafer calculus of evidence [35]. 

If, on the other hand, the original interpretation of logical necessity—corresponding to a relation 
R that is equal toll xll, i.e., that relates every pair of possible worlds—is given to the operator N, 
then a proposition is necessarily true if and only if it is true in every possible world. 

If the relation R is chosen as 

R—if x IT , 

then this interpretation may be used to characterize approximate reasoning problems as those where 
a hypothesis of interest is neither necessarily true nor necessarily false in worlds in the evidential 
set ST, reflecting the inability of conventional deductive techniques to unambiguously determine the 
truth-value of the hypothesis. 9 

In those problems, in spite of this fundamental impossibility, we may resort to approximate rea¬ 
soning methods to describe various properties of the evidential set if. For example, the probabilistic 
structures utilized by various probabilistic reasoning approaches typically characterize relations of 
the form 

/i(//A?):/i(-.//A?), 

between the “measures” of the subsets of the evidential set if where a hypothesis H is true or false, 
respectively. 

3 The notion of approximate reasoning problem is often extended to encompass situations where deductive tech¬ 
niques cannot always be used because of practical limitations on computational resources. 
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Our aim will be to study how other structures, defining a metric or distance in the universe U, 
may be used to describe the nature of the evidential set. To do so, we will assign a different meaning 
to the accessibility relation, giving it an interpretation that regards related worlds as “similar” or 
“close” in some sense. We will require, however, a scheme that is richer than that provided by a 
single relation so that we can extend modal notions and derive semantics bases for fuzzy logic, which 
relies on concepts of degrees of matching or closeness expressed by real numbers between 0 and 1. 

In what follows we will use the symbols => and & to denote strong implication and equivalence, 
respectively. A proposition q strongly implies p (denoted q => p) if and only if p is true in any world 
where q is. Similarly, p is logically equivalent to q (denoted p O q) if and only if p and q are true in 
the same subset of worlds of U. 

Following traditional terminology, we will say also that a proposition p is satisfiable if there exists 
a possible world p such that w h p. 
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3 EXTENDED MODALITIES 


We turn first our attention to the problem of generalizing modal logic formalisms to explain the 
structures and functions of fuzzy logic. 

A number of authors have studied various relations between fuzzy and modal logics. Lakoff[24], 
Murai et al. [28], and Schocht[36] have proposed graded generalizations of basic modal constructs. 
Dubois and Prade [13,14] have also explored analogies between these nonBtandard logics. In a recent 
paper [12], they have developed, in addition, a modal basis for possibility theory by means of the 
introduction of fuzzy structures into modal frameworks with the goal of deriving proof mechanisms 
that may be used in possibilistic reasoning. 

The goal for the model presented in this note is somewhat different from the objectives guiding 
those efforts. We will seek explanations for possibilistic constructs on the basis of previously existing 
notions rather than generalizations of modal frameworks by means of fuzzy constructs. The model 
presented here is not based on the use of graded notions of possibility and necessity as primitive 
—and, by implication, easy to understand—structures. The foundation for this model is provided 
by a generalization of the accessibility relation, which is given a simple interpretation as a measure 
of resemblance and proximity between possible worlds. 

We will extend the notion of accessibility relation to encompass a family of nonempty binary 
relations R a that are indexed by a numerical parameter a between 0 and 1. These relations, which 
are nested, i.e., 

Ra C Rfi, whenever /?<<*, 

are introduced to represent different degrees of similarity, using a scheme that is akin to that used 
by Lewis in his study of counterfactuals[25]. The family of accessibility relations introduced here 
differs from that proposed by Lewis, however, in its use of numerical indexes 4 and in the nature 
of the overall modeling goals that, in Lewis’ formalism, are intended to represent changes of scale 
induced by consideration of different restrictive statements. 

3.1 Similarity Relations 

To facilitate the definition of a family of accessibility relations we introduce a similarity function 

S:UxU^[ 0 , 1 ], 

assigning to each pair of possible worlds (w, w') a unique degree of similarity between 0 (correspond¬ 
ing to maximum dissimilarity) to 1 (corresponding to maximum similarity). 

With the help of this function, we will then say that w and v/ are related to the degree a, 
denoted R a (w,w'), if and only if S(w,w') > a. In this way, the relations Ra have the required 
nesting property with Re corresponding to the whole Cartesian product U xU (or, every possible 
world is at least similar in a degree zero to every other possible world). 

4 We will later see that similarities may be measured using more general, nonnumeric, scales. For simplicity reasons, 
we will avoid at this point the introduction of more general schemes that unnecessarily complicate the exposition. 
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Some properties are required to assure that the function S has the required semantics of & 
metric relationship capturing the intuitive notion of similarity or “proximity.” It is first necessary 
to demand that the degree of similarity between any world and itself be as high as possible, i.e., 

S(w, w) = 1, for all w in U . 

This property assures that every one of the accessibility relations R a will be reflexive and, following 
the nomenclature introduced by Zadeh for fuzzy relations [52], we will also say that the similarity 
relation is reflexive. 

Next, we will call for the function 5 to be symmetric, i.e., 

S(w, tv') = S(w\ w), for any worlds u; and w 1 in U . 

This is a very natural requirement of any relation intended to represent a relation of resemblance 
between objects. 

Finally, and most importantly, we will impose a form of transitivity requirement upon the simi¬ 
larity function 5 that turns it into a generalized equivalence relation. The purpose of this restriction 
is to assure that S has a reasonable behavior as a metric in the universe of possible worlds. It would 
certainly be surprising if, for some similarity 5, we were to be told that w and u/ are very similar 
and that w ' and w" are also very similar, but that w does not resemble w" at all. Clearly, there 
should be a lower bound on the possible values of S(w, w") that may be expressed as a function of 
the values of S(w, w 1 ) and S(w',w"). We will express such a constraint using a numeric operation, 
denoted ®, that takes as arguments two real numbers between 0 and 1 and that returns another 
number in the same range, i.e., 

®: [0,1] x [0,1] i —* [0,1], 

in the form of the inequality 


S(w, w") > S{w, w') ® S(w', w "), 

assumed valid for any worlds w, w' and w" in the universe U . Recurring again to a modal terminology, 
the above transitivity constraint, which will be called ®-transitivity, may be rewritten in relational 
form as 

R a ® p C R a °Rp , for all 0 < a, P < 1 , 

making obvious its generalisation of the conventional definition of transitivity for ordinary binary 
relations, i.e., 

RCRoR. 

Since the role of ®, through recursive application, is that of providing a lower bound for the 
similarity between the two end members wi and w n of a chain of possible worlds [tui,u> 2 ,...,ti>„], 
it is obvious that the operation ® should be commutative and associative. Furthermore, it should 
also be nondecreasing in each argument, as it is reasonable to ask that the desired lower bound be 
a monotonic function of its arguments. Finally, it is also desirable to ask that 

a ® 1 = 1 ® a = a, 

i.e., that the values of the similarities of two indistinguishable objects to a third should be the same. 
These requirements are equivalent to demanding that the operation ® be a triangular norm [37], 
orT-norm, for short. 
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Triangular norms, originally introduced in the theory of probabilistic metric spaces to treat 
certain statistical problems, play a distinguished role in [0, l]-multivalued logics [1,11,17,31] as the 
result of imposing reasonable requirements upon operations that produce the truth value of the 
conjunction of two expressions as a function of the truth values of the coryuncts. Furthermore, 
generalized similarity relations (called B-R relations by Zadeh [54]) also have an important function, 
to be examined further later in this note, in the generalization of the inferential rule of modus 
ponens [43,10]. Our axiomatic derivation for the requirement that ® be a T-norm is based, however, 
solely on metric considerations, applied here to a space of possible worlds, but is valid in general 
metric spaces. 

From the axioms of triangular norms, it is easy to see that 

a®/? < min(a,0), 

showing that the minimum function, itself a T-norm, is the largest element in this class of operations. 
Its minimal element, on the other hand, is the noncontinuous function ® defined by 

( a » if £ = 1, 
a®/?= i 0, if a = 1, 

( 0, otherwise. 

Every symmetric and reflexive relation is ®-transitive for this triangular norm, which is, therefore, 
of little practical utility. 

In what follows, we will also impose a most reasonable additional assumption of continuity of 
® with respect to its arguments (i.e., why should there be a jump in the value of a lower bound 
provided by ® when the values of its arguments are slightly changed?). The class of continuous 
T-norms does not have a minimal element, although under certain additional assumptions (requiring 
T-norms to be also J-copulas [37]), the inequality 

max(a + 0 - 1,0) < a®0 

also holds true, showing that certain important continuous T-norms lie between that of the Ki-logic 
of Lukasiewicz [17] and that of the original fuzzy logic proposed by Zadeh [53]. 

Continuous triangular norms play a significant part in the theories of pattern recognition and 
automatic classification. The author [33] proposed the use of generalized similarity relations based 
on the T-norm of Lukasiewicz to generalize existing classification techniques—based on the mapping 
of a similarity function into a conventional equivalence relation—to the fuzzy domain—by mapping 
these T-norms (called likeness relations by Ruspini) into generalized fuzzy partitions. Bezdek and 
Harris [3] independently studied axiomatic approaches to cluster analysis based on the use of several 
continuous T-norms. 

The author has also studied [34] the possible relation between the multivalued logic and similarity 
related aspects of T-norms, and suggested that the degrees of similarity between two objects A and 
B may be regarded as the “degree of truth” of the vague proposition 

U A is similar to B .” 

Having argued that 5 should have the structure of a generalized equivalence relation, we will 
assume, mainly for reasons of simplicity, that the function 5 is the dual of a “true” distance, i.e., 
that 

S(w, to') = 1 if and only if w = to'. 
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This restriction, which is not substantial, is introduced primarily to assure that different possible 
worlds may be distinguished by means of the function 5. Otherwise, the equivalence relation that 
relates two worlds tv and tv' if and only if S(w, tv') = 1 may be used to partition our universe U into 
“indistinguishable” nonintersecting classes—indicating that our metric cannot discriminate between 
significant differences in system state. 

Before closing our presentation of generalised similarity relations, it is important to remark upon 
the close relation between the notion of similarity and that of distance. If a function 6 is defined in 
terms of a similarity function 5 by the simple relation 

6=1-5, 

then it is easy to see that the function 6 has the properties of a metric or distance. This is evident 
if the operation $ corresponds to the T-norm of Lukasiewicz, since the transitivity condition is 
equivalent to the well-known triangular inequality, i.e., 

S(w, ti>") < S(tv, tv') + 6(tv',tv"). 

If other T-norms are used, even stronger inequalities hold, with the so-called “ultrametric inequality” 

6(tv, tv") < max ( 6(tv, tv'), 6(tv', tv")) 

being valid for the T-norm of Zadeh. In this case, each of the relations in the family Rq (known in 
fuzzy set theory as the a-cut 8 of the similarity 5) is a conventional equivalence relation. This fact 
was exploited, prior to the introduction of fuzzy set theory and fuzzy cluster analysis, by a variety 
of clustering procedures of the “single-link” type [22,40]. 

3.2 Possible and Necessary Similarity 

Our semantic formalization needs require the introduction of constructs to indicate the extent by 
which a concept exemplifies, illustrates, or is an adequate model of another concept. Our interpre¬ 
tations shall, therefore, be oriented toward characterization of the degree by which a concept can 
be said to be a good example of another concept with the purpose of defining vague concepts by 
means of measures of proximity between defined and defining concepts. In our treatment, each of 
the multiple “definiens” will be a conventional proposition corresponding to a subset of possible 
worlds. It is conceivable, however, that new vague concepts might also be described by indicating 
their metric relations to other vague concepts. 

The required constructs are based on the idea that whenever p and q are propositions such that 
p=> q, then any p-world is an “example” of a 9-world. This basic notion will be generalized by the 
introduction of modal structures that define to what degree possible worlds that satisfy a certain 
proposition q fit a vague concept. Some of those possible worlds are “paradigmatic” of the vague 
concept, i.e., they fit it to a degree equal to 1 in the same sense that we may say, for example, in an 
absolute (i.e., nongraded) sense that somebody whose height is 7 ft is definitely “tall.” If we use a 
notion of graded fitness, however, certain worlds will fit the concept to a degree, i.e., they resemble 
(or are similar) to some paradigmatic example of the vague concept. 

The conventional interpretation of possibility needs to be modified, therefore, to capture the idea 
that a particular possible world is similar in some degree to another world that satisfies a “reference” 
proposition. 

5 The a-cut of a fuzzy »et /j: U i-* [0,1] U the conventional *et of all point* w »uch that v(w) > a. A timilar 
concept i* defined for relation* a* «ub*et* of a product space U X V. 
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More generally, however, we will be interested in relations of similarity between pairs of subsets of 
possible worlds rather than between pairs of possible worlds. This requirement complicates matters 
considerably since we will be forced to consider both the “validity” of a proposition p in some world 
where another proposition q is true, as well as its applicability in every world where q is true. In 
the former case, we will care about the existence of ^-worlds that are similar to some degree to some 
p-world, while in the latter we will be concerned with the size of the minimum neighborhood of p 
(as a subset of the universe U ) that fully encloses the subset q. 

This dual concern for what may possibly apply and what must necessarily hold—an essential 
aspect of modal logic—is typical of situations where relationships between ensembles of objects are 
described in terms of relations between their members. In the probability calculus, for example, 
knowledge of probabilities over certain families of subsets provides “sharp” upper and lower bounds 
(called inner and upper probabilities, respectively) for the probabilities of other subsets an impor¬ 
tant fact in the extension of set measures to larger domains [19]. The role and properties of these 
bounds in the Dempster-Shafer calculus of evidence is well-known, having been described in the 
original paper of Dempster [8], related to concepts of modal logic by Ruspini [35], and being also the 
subjects of considerable formal study [7] as mathematical structures. 

Analogies between the role of probabilistic bounds (i.e., bounds for probability values) and pos¬ 
sibility/necessity distributions—shown below to have play a similar part with respect to metric 
structures—have been the source of much of the confusion about the need for possibilistic schemes. 
Each upper/lower-bound pair, however, leads to a substantially description of the nature of a subset 
of possible worlds, being, in either case, measures that arise naturally when pointwise properties are 
extended to set partitions. General properties of these measures have been studied by Dubois and 
Prade[ll] in the context of approximate reasoning and in other regards by Pavlak[30]. 

Our generalizations of the notions of possibility and necessity are related to the so-called de re [21] 
interpretation of the statement “If q, then p is possible” as the modal propositional relation 

q => Up. 

We will say that the proposition q implies, or is a necessary model of, the proposition p to the 
degree a if and only for every g-world w there exists a p-world w' that is at least a-similar to it, 
(i.e., S(w, tv') > a), or equivalently, whenever 

q => II a p. 

Similarly, we will say that the proposition q is consistent with, or is a possible model of, the 
proposition p to the degree a 6 if and only there exist a g-world w and a p-world w' that are at least 
a-similar, or equivalently, whenever 

-i(p => -riotf) . 

The similarity function that we have introduced in the universe U provides us with a simple 
mechanism to quantify both the extent of “inclusion” and that of the “intersection” between pairs 
of subsets of possible worlds. 7 

6 Note that our characterization* of both possibility and necessity distributions are based in the modal possibility 
operators Tla- 

7 For reasons that by now should be evident, we will not need to introduce a concept of “unconditioned possibility 
although it would be easy to do so using q = U. Being concerned with the power of certain propositions to exemplify 
other conditions, we will not have much occassion to deal with the strength of tautologies in that regard. 
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33 Possibilistic Implication and Consistence 

The notion of subset inclusion and its related concept of set identity are of central importance in 
deductive logic, since subsets of possible worlds are formally equivalent to propositions with subset 
inclusion and identity corresponding to logical implication and equivalence, respectively. These 
propositional relationships are the basis of derivation rules such as the modus ponens. The notion 
of intersection plays a similar role in modal analyses because of its ability to express the potential 
validity of a statement. 

Classical accounts, however, recognize only two “degrees” of inclusion corresponding to the cases 
when either a set q is a subset of another set p or it is not, with a similar dichotomy applying to 
degrees of intersection. Our generalisation exploits the metric structures defined between sets of pos¬ 
sible worlds by introducing measures that describe a subset as enclosed in a neighborhood (of some 
size) of another set while intersecting another of its neighborhoods (of “smaller” size).* The problem 
of measuring the “size” of those neighborhoods is the subject of our immediate considerations. 

33.1 Degree of Implication 

Our definition of partial implication between propositions was based on conditions that determine 
whether, given two propositions p and q, one of them implies the other to the some value o. In 
particular, since every world w is always similar in a degree that is at least equal to zero to any 
other world w', it is always true that any proposition q implies any other proposition p to the degree 
zero. It is often the case, however, that the degree of implication between p and q is at least equal 
to some certain positive value o. 

If we want to generalize procedures based on inclusion relationships, such as the modus ponens, 
in an efficient fashion, we will need measure the “optimal” (or maximum) value of the parameter a 
such that q implies p to the degree a. This value is a measure of the degree by which the set of all 
p-worlds must be “stretched” to encompass the set of all {-worlds. The least upper bound of the 
values of the similarities between any {-world u/ and some p-world w (depending, in general, from 
w') is given by the degree of implication function: 

Definition: The degree of implication of p by { is the value 

I(p! 9) = inf sup S(w,w'). 

whp 


Defined in this way, the degree of implication I(p|{) is a measure of the “minimal amount” of 
stretching required to reach a p-world from any {-world, in the sense that if fi < I(p | {), then 

q=>Upp. 

Furthermore, a is the largest real value for which the above statement may be made. 

As the following theorem makes clearer, this function provides the bases for the generalization 
of the modus ponens. This truth-derivation procedure may be thought of as an expression of the 
nesting relationships that hold between the sizes of neighborhoods of such subsets. 

8 It U important to recall that, due to our reliance on similarity rather than on the dual notion of dissi m i l arity or 
distance, high values of o correspond to low values of “stretching" or to smaller set neighborhoods. 
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Theorem: The degree of implication function, 

I: ifx if ~ [0,1], 

has the following properties: 

(i) If p => r, then I(p | 9 ) < I(r | q) 

(ii) If q =*• r, then I(p | q) > I(p | r) 

(iii) I(P|«) > I(P I r)®I(r |«) 

where p,q and r are any satisfiable propositions. 

Proof: The first two properties are an immediate consequence of the definition of degree of impli¬ 
cation. To prove the third, observe that by definition of similarity 

S(to, w') > S(u>,u> w )®S(u>",u/) 

for any worlds w, w 1 , and w". 

Taking the supremum on both sides of this inequality with respect to all worlds w h p, it follows, 
because ® is continuous, that 

sup S(u>,u>') > [sup S(w,w ")] ® S(u/',u >'). 

whp whp 

Since this expression is true, in particular, for all worlds w" h r, it is true that 
sup S(w,w') > [inf sup S(ui, w") ] ® S(w, w') 

whp w"hr whp 

= I(p|r)®S(u>,tt/), 

where w is any world such that ii) H r. 

From this inequality, it follows, since ® is continuous, that 

sup S(w, w') > I(p | r) ® [sup S(w, w') ]. 

whp ihr 

Taking now the infimum on both sides of this expression over all worlds u/ such that w' h q, it is 
easy to see, using again the continuity of ®, that 

inf sup S(w, w') > I(p j r ) ® [ inf sup S(w, w') ], 

m't-f whp ii/'hf tfKr 

proving the ®-transitivity of 1 . I 

Note, that since 1 ( 9 1 q) = I for any proposition q, the following statement is also true: 
Corollary. If p and q are propositions in if, then 

I(p|«) = 8 up [l(p| »*)®I(r 1 9 )] . 
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Notice also that if I(p | 9 ) = 1, then 

•up S(u>, w') = 1 , for all vf h q . 

*fhj> 

Under minimal assumptions (assuring that the supremum operation is actually a maximization), 
this relation is equivalent to stating that q strongly implies p, or that any 9 -world is also a p-world. 

The nonsymmetric function I measures the extent by which every world w 1 in a certain class 
resembles some world w (dependent of w 1 ) in a reference class, possibly explicating the nature of 
the nonsymmetric assessments [45] found in psychological experimentation when subjects are asked 
to evaluate the degree by which an object “resembles” another. The results obtained in those 
experiments suggest that human beings, when assessing similarity between objects, use one of them 
(or a class of similar objects) as a reference landmark to describe the other. Such assymmetries might 
be explained by noticing that, in general, I(p i 9) ^ I(9 1 p), indicating that the stronger stimulus 
might generally be used to construct a reference class, which is then used to describe other stimuli. 

The degree of implication of one proposition by another can be readily used to generate a measure 
of similarity between propositions that generalizes our original measure of similarity between possible 
worlds: 

S(p,q) = min [I(p 1 9 ), 1(9 |p)], 

quantifying the degree by which the propositions p and 9 are equivalent. 

It may be readily proved [44], from its definition and from the transitivity property of I that 5 is 
a reflexive, symmetric, and ©-transitive function between subsets of possible worlds. This similarity 
function is the dual of the well-known Hsusdorff distance, defined between subsets of a metric as a 
function of the distance between pairs of their members [9], which is given by the expression 

6(A,3 ) = max ( sup inf S(x,y)), (sup inf S(x,y) 

L *€X jrefl »6 A 

The result expressed by the transitive property of the degree of implication may be stated using 
modal notation in the form 

9 => n o r and r^Tlpq imply that 9 =»II a ®pPi 

as the simplest form of the generalized modus ponens rule of Zadeh. 

The relationship between this rule and the classical modus ponens is easier to perceive if it is 
remembered that classical conditional propositions of the form “If 9 , then p,” simply state that the 
set of 9 -worlds is a subset of the set of p-worlds. Such relationships of inclusion may also be described 
in metric terms by saying that every 9 -world has a p-world (i.e., itself) that is as similar as possible 
to it. 

Logic structures, however, only allow us to say that either 9 implies p or that 9 implies its negation 
->p, or that neither of those statements is true. By contrast, similarity relations allow measurement 
of the amount by which a set must be “stretched” (as illustrated in Figure 1) to enclose another 
set. Using such metrics, we may describe the generalized modus ponens as a relation between the 
stretching required to reach p from any point of the set r, the stretching required to reach r from 
any point of the set 9 , and the stretching required to reach p from any point of the set 9 . 

In Section 5 we will derive alternative expressions for the generalized modus ponens that allow 
to propagate both measures characterizing degree of implication and degree of consistence ; a dual 
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Figure 1: The Generalized Modus Ponens. 


concept that ^lays, with respect to the notion of possibility, the function that is fulfilled by the 
degree of implication function with respect to necessity. In those derivations, by introduction of 
sharper bounds for certain conditional concepts, we will also be able to improve the quality of the 
bounds provided by generalized modus ponens rules while being closer in spirit to its usual fuzzy-logic 
formulation. 

3.3.2 Degree of Consistence 

A notion that is dual to that of degree of implication is given by a function that measures the point- 
wise proximity between pairs of possible worlds from an "optimistic” point of view characterizing 
the degree by which statements that are true in some worlds may apply on others. By contrast, the 
degree of implication measures the extent by which statements that are true in p-worlds must hold 
in ^-worlds. 

Definition: The degree of consistence of p and 9 is the value 

C (p 1 9 ) = sup sup S(u>, w'). 


An immediate consequence of this definition that C (• | •) is a symmetric function that is increas¬ 
ingly monotonic in both arguments (with respect to the =» ). If is also easy to see that the values 
of the degree of consistence function are never smaller than the corresponding values of the degree 
of consistence function, 

I(p| 9 )<C(p| 9 ), 

as the amount of stretching required to reach p from some "convenient” 9 -world is smaller (i.e., 
higher values of S) than that required to reach p from any 9 -world. In general, however, the degree 
of consistence function is not transitive, preventing the statement of a "compatibility” counterpart of 
the generalized modus ponens rule. Its relationship with the degree of implication function expressed 
by the expression 

C(p| 9 ) = sup I(p|u)') = sup I(g|u>) 

tti'Hf t 

will permit us, nonetheless, to derive a useful bound-propagation expression. 
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4 POSSIBILITY AND NECESSITY DISTRIBUTIONS 


This section presents interpretations of the major constructs of fuzzy iogic —possibility and necessity 
distributions—in terms of similarity-based structures. Possibility and necessity distributions are 
functions that measure the proximity of either all or some of the worlds in the evidential set to 
worlds in other sets that are employed as reference landmarks. 

The role played by possibility and necessity distributions is similar to that performed by lower 
and upper bounds of probability distributions (or by the belief and plausibility functions of the 
Dempster-Shafer calculus of evidence) with respect to probability distributions. The essential differ¬ 
ence between these bounds and those provided by possibility/necessity pairs lies in the fundamentally 
dissimilar character of what is being bound—metric structures relating pairs of worlds in one case; 
measures of set size, on the other. Furthermore, in the model of possibilistic structures that is 
presented in this note necessity (possibility) distributions are any lower (upper) bounds of certain 
metric functions rather than its “best” or “sharpest” bounds. The operations of fuzzy logic allow 
computation of bounds for some of these measures as a function of bounds of other measures. 

4.1 Inverse of a Triangular Norm 

When working in ordinary metric spaces, it is often convenient to express the conventional statement 
of the triangular inequality, i.e., 


6(w, w 1 ) < 6(w, w") + 6{w", w'), 


in the equivalent form 

6(w,w') > | ^(ly.tt;^) - 6(w',w") |, 

which utilizes a form of inverse (i.e., the substraction operator -) of the function used to express 
the original inequality (i.e., the addition operator +). This notion of inverse may be directly gener¬ 
alized [37] to provide us with the tools required to define possibility and necessity functions and to 
derive useful forms of the generalized modus ponens involving either type of these constructs. 

Definition: If ® is a triangular norm, its paeudoiaverse 0 is the function defined over pairs of 
numbers in the unit interval of the real line, by the expression 

o06 = sup{c: 6®c<a}. 

From this definition it is clear that a0& is nondecreasing in a and nonincreasing in 6. Furthermore, 
a0O = 1 and a01 = a for any a in [0,1]. Other important properties of the pseudoinverse function 
are given in the works of Schweizer and Sklar[37], Trillas and Valverde [43], and Valverde[44]. 

Examples of the pseudoinverses of important triangular norms are given in Table 1 together with 
the corresponding conorms. 
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Table 1: Triangular Norms, Conorms, and Pseudoinverses 


Name 

T-Norm a®b 

Conorm a ® b 

Pseudoinverse a 0 b 

Lukasiewicz 

max (a + 6 - 1,0) 

min (a + 6,1) 

min(l -fa — 6,1) 

Product 

ab 

a + b — ab 

a/b , if 6 > a 

1, otherwise 

Zadeh 

min (a, b) 

max (a, b) 

a, if 6 > a 

1, otherwise 


A2 Unconditioned Necessity Distributions 

We introduce first a family of functions that bound by below the value of the similarity between 
any evidential world in if to some world where another proposition p is true. These unconditioned 
necessity distributions are lower bounds for values of the degree of implication I(p|ST), which 
measures the extent by which statements that are true in a reference set (i.e., the subset of p-worlds) 
must hold in the evidential set. 

As observed before, whenever I(p|Jf) = 1, it is true, under minimal assumptions, that the 
evidential subset if is a subset of the set of all p-worlds, or that p necessarily holds in if. If, on 
the other hand, I(p| if) = a < 1, then p must be stretched a certain amount—with smaller a 
corresponding to larger stretching—in order for one of its neighborhoods to encompass if. 

Definition: If if is an evidential set, then a a function Nec(-) defined over propositions in the 
language if is called an unconditioned necessity distribution for if if 

Nec(p) < I(p | if). 

4.3 Unconditioned Possibility Distributions 

The dual counterpart of the unconditioned necessity distribution is provided by upper bounds of 
the degree of consistence C(p| if). Whenever C(p| if) — 1, it is easy to see that, under minimal 
assumptions, there exists a p-world w that is in the evidential set if or, equivalently, that p (for all 
we know) is possibly true. If, on the other hand, C(p | if) = a < 1, then there exists a neighborhood 
(of “size” a) of some p-world that intersects the evidential set. 

Definition: If if is an evidential set, then a function Poss(-) defined over propositions in the 
language if is called an unconditioned possibility distribution for if if 

Poss(p) > C(p| Jf). 

Since the value Poss(p) of any possibility function Poss(*) is an upper bound of the value 
C(p|ST) of the degree of consistence, while the corresponding value Nec (p) of any necessity function 
Nec(-) is a lower bound of I(p If), it follows that values of a possibility function can never be smaller 
than the corresponding values of any necessity function, i.e., that 

Nec(p) < Poss(p). 
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4.4 Properties of Possibility and Necessity Distributions 

In this subsection we will develop similarity-based interpretations for some basic formulae of possi- 
bilistic calculus. These expressions may be thought of as mechanisms that allow the extension of a 
partially known possibility distribution. For example, the property that 

max(Poss(p), Poss(g)) > C(p Vq | if ), 

which is proved below, is the similarity interpretation of the standard rule that allows computation 
of the value of the possibility value of a disjunction in fuzzy logic, i.e., 

Poss (pVq) = max(Poss(p), Poss(g)). 


Theorem: If p and q are propositions, and if the quantities Poss(p), Poss(g), Nec(p), and Nec(g) 
are such that 


Nec(p) < I(p|Jf), Nec(?) < I(?|fr), 

Poss(p) > C(p|fr), Poss(?) > C(g|ff), 

then the following statements (similarity-based interpretations of the basic laws of fuzzy logic) are 
valid: 


max(Nec(p), Nec(?)) < I(pVg|Jf), 
max(Poss(p), Poss(?)) > C(pVg|$f), 
min(Poss(p), Poss(?)) > C(pA?|Sf). 

Proof: Note first that since C (• | •) is nondecreasing (with respect to the => order) in its argu¬ 
ments, it is true that 

Poss(p) > C(p|ff) > C(pAg|JT), 

Poss(?) > C{q\ff) > C(phq\V), 

whenever p A q is satisfiable, from which it is easy to see that 

min(Poss(p), Poss(g)) > C(p Aq | if), 

The corresponding result is obvious when p A q is nonsatisfiable. 

A similar argument shows, for necessity functions, that 

max(Nec(p), Nec(g)) < I(p V? | if). 

To prove the disjunctive law for possibilities, notice that if / is any function mapping elements 
of a general domain D into real numbers, then 

sup { /(d): d € A U B } = max [sup { /(d): d € A },sup { /(d): d 6 B }1 . 
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From this equalitv, it is easy to see that if Poss(p) and Poss(?) are upper bounds of I(p | if) 
and I({ | if), respec .✓ely, then 

max( Poss(p), Poss(g)) > C(pV q \ if ), 

completing the proof of the theorem. I 

Note, however, that another law commonly given as an axiom for necessity functions does not hold 
valid in our interpretation. As illustrated in Figure 2, the distance from a point to the intersection 
of two sets may be strictly larger than the distances to either set (i.e., the similarity will be strictly 
smaller). In general, therefore, it is 

min(Nec(p), Nec(«)) £I(pAj|$r), 

making invalid, under this interpretation, the conjunctive law for necessities [11] 

Nec(pA$) =min(Nec(p), Nec(fl)). 



Figure 2: Failure of Conjunctive Necessity. 

We may also note in this regard that the similarity-based model that is discussed here does not 
make use of the notion of negation either as a mechanism to generate dual concepts or on its own 
right as an important logical concept. It is the intent of the author to study, in the immediate future, 
alternative models where notions of negation and maximal dissimilarity play more substantive roles. 

4.5 Conditional Possibilities and Necessities 

The concepts of conditional possibility and necessity are closely related to the previously introduced 
unconditioned structures. These structures may be thought of as a characterization of the proximity 
of a world w to some or ail of the worlds where a proposition p is true, given that w is similar in 
the degree 1 to the evidential set if (i.e. w h if). With this fact, in mind, we could have used the 
somewhat baroque formulation 

C(p|JT)= sup [l(p|u>)0l(#|tt>)] 
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to define unconditioned possibility distributions—a rather unnecessary effort if we consider that 
I(& | w) = 1 whenever t vhif, showing its obvious equivalence to the simpler form used in Sec¬ 
tion 3.3.2 above. In spite of such observation, the above identity is important in understanding 
the purpose of the definitions given below. Those definitions interpret conditional possibilities and 
necessities as a measure of the proximity of worlds on the evidential set if to (some or all) worlds 
satisfying a (conditioned) proposition p relative to their proximity to (some or all) the worlds that 
satisfy another (conditioning) proposition q. 

The mechanism used to specify that relationship, which is closely related in spirit to results of 
Valverde [44] on the structure of indistinguishability relations, is based on the pseudoinverse function 
introduced in Section 4.1. The basic idea used by these definitions is also illustrated in Figure 3, 
where, from the perspective of the evidential world w, the similarity between the p-world u and the 
9 -world v is estimated by means of an inequality that generalizes the “absolute value” form of the 
triangular inequality, i.e., 

£(u,t>) > |5(u,ui) - 6(v,tu)|, 

to its similarity-based form 

S(u, t)) < min [ S(u, w ) 0 S(v, w), S(v, tu) 0 S(u, w) ] . 



Figure 3: Similarities as Viewed from the Evidential Set. 

The required interplay between similarities to conditioning and conditioned sets is captured by 
the following definitions. 

Definition: Let if be an evidential set. A function Nec(-|-) mapping pairs of propositions in the 
language if into [0,1] is called a conditional necessity distribution for if if 

Nec(g|p) < inf [l(? | w) 0 I(p | tn)] , 
for any propositions p and q in if. 
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Definition: Let if be an evidential set. A function Poss(-|-) mapping pairs of propositions in the 
language if into [0, 1 ] is called a conditional possibility distribution for if if 

Poss(?|p) > sup [l(« | w) 0 1(p| tt>)], 
whir 

for any propositions p and 9 in if. 

It is easy to see, from these definitions, that the values of a conditional necessity distribution are 
never larger than the corresponding values of any conditional possibility distribution, i.e., 

Nec(«|p) < Poss(?|p). 

Furthermore, since I(-1 •) is (^-transitive, then 

From this inequality and the definition of pseudoinverse of a triangular norm, it is easy to see that 
any necessity function satisfies the inequality 

Nec(?|p) >I(?|p), 

i.e., the bounds for necessity functions provided by the evidential-set perspective are stronger than 
those that can be obtained by direct use of the degree of implication function . 9 

Note also that if Nec(p) = 1, indicating that I(p | if) = 1, and if Nec(?|p) = 1, then the above 
definition of conditional necessity shows that 1 ( 9 1 if) = 1, indicating that Necfa) may be taken 
to be equal to 1, thus generalizing the well-known axiom (consequential closure) of certain modal 
systems (e.g., the system T, as discussed in Hughes and Creswell [21]) 

If Np and N(p -♦ 9 ), then N 9 . 

The definitions above can also be further interpreted as a way to compare the similarities between 
evidential worlds and those in the conditioning and conditioned sets by noting that whenever 

for every evidential world u> h if, then Nec( 9 |p) may be chosen to be equal to 1. Similarly, if 
there exists some world u; I -if where this inequality holds, then it is Poss( 9 |p) = 1. In either case, 
however, the maximum value for the conditional distribution (i.e., 1 ) is reached when the proximity 
of one evidential world w —in the case of possibilities—or of every one of them—in the case of 
necessities—to a world w t in the conditioned set exceeds the proximity of w to the conditioning set 
p. In either case, once again recurring to an apparent notational overkill, we may state this fact by 
means of the identity function r in the unit interval: 

r: [0, 1 ] ► [0, 1 ): cr ► or, 

in the form 

1(91 u>) > r(l(p| u>)) , 

9 A dual inequality for possibilities involving C (q | p) does not hold in general. It it easy to ice, however, that 
C (? | if) 0I(p | if) is a possibility function for q given p. 
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for some w 1 - & in the case of possibilities, with the same inequality holding for every w I- if in the 
case of necessities. We may, however, conceive of other functions 


7 : [ 0 , 1 ] h-* ( 0 , 1 ]: a h* 7 ( 0 ), 

with 7 (a) > a to specify a stronger form of implication, as illustrated in Figure 4, i.e., 

I(fM>7(I(p|tu)). 

Similarly, one may also conceive of functions if> with < a that may be used to model weaker 
forms of implication. 



Figure 4: Examples of Possible Similarity Relationships between Conditioning and Conditioned Sets. 

Possibilistic calculi based on the propagation of truth-mappings of this type, first proposed by 
Baldwin [ 2 ], are utilized in the RUM [4,5] and MILORD [18] expert systems. The particular case 
when 7 = r, stating that every a-cut of the conditioning proposition p is fully enclosed (in the 
conventional sense) in the a-cut of the conditioned proposition q, has been called the truth mapping 
in the fuzzy logic literature. 

The primary purpose of conditional distributions, however, is to provide a quantitative measure 
of the strength by which one proposition may be said to imply another with a view to extend 
inferential procedures by means of structures that superimpose the topological notion of continuity 
upon a logical framework concerned with propositional validity. 
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5 GENERALIZED INFERENCE 


The major inferential tool of fuzzy logic is the compositional rule of inference of Zadeh [53], which 
generalizes the corresponding classical rule of inference by its ability to infer valid statements even 
when a perfect match between facts and rule antecedent does not exist, i.e., 

p P' 

from p —i ► q to its “approximate” version p —> <7 , 


where p' and q' are similar to p and q, respectively. In this sense, the generalized modus ponens 
operates as an “interpolation” (or, more precisely, as an “extrapolation”) procedure in possible-world 
space. 

Unlike the interpolation procedures of numerical analysis, however, which yield estimates of 
function value, this extrapolation procedure approximates truth in the sense that it produces a 
proposition that is both more general than the consequent of the inferential rule and resembles it 
to some degree (which is a function of the degree by which p' resembles p). The “extrapolated 
conclusion,” however, is a correctly derived proposition, i.e., the result of a sound logical procedure 
rather than of an approximate heuristic technique. 

5.1 Generalized Modus Ponens 

The theorems that are proven below are based on the use of a family of propositions that 
partitions the universe of discourse U in the sense that every possible world will satisfy at least one 
proposition in &. 

Definition: If & is a subset of satisfiable propositions in if such that if w is a possible world in 
the universe U, then there exists a proposition p in 3* such that w h p, then the family is called 
a partition of U. 

These results make use of information such as the values of the unconditioned necessity (resp., possi¬ 
bility) distributions for antecedent propositions p in the family 3* together with the values Nec(g|p) 
(resp., Poss(g|p)) to “extend” the unconditioned distributions to the “consequent” proposition q. 
In this sense, these findings interpret, in the same spirit used in the theorem of Section 4.4 for other 
basic laws, the generalized modus ponens laws of fuzzy logic: 

Nec(g) = sup [ Nec(g|p)® Nec(p) ] , 

S' 

Poss(g) = sup [ Poss(g|p)®Po8s(p) ] . 
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Theorem (Generalized Modus Ponens for Necessity Functions): Let ^ be a partition of U and 
let q be a proposition. If Nec(p) and Nec(g|p) are real values, defined for every proposition p in 
the partition &, such that 

Nec(p) < I(p|8r), 

Nec(g|p) < inf [l(?|t/>)0l(p|u>)] , 

then the following inequality is valid 

sup [Nec(g|p)®Nec(p)] < 1(5 | if). 

& 

Proof: Note first that since 0 is nonincreasing in its second argument and since 

I(p| #) < I(pM 

for every evidential world w, it is 

Nec(g|p) < inf [l(g | u>) 01(p | w) ] < inf [l(? | u>) 0l(p | #)] . 

wt-& wV-ff 

It follows then from the monotonicity and continuity of ® with respect to its arguments that 

Nec(p)® Nec(g|p) < I(p|Jf)@inf [l(? | w) 01(p | #)] 

uiY-if 

= inf I(p|$O®(l(g|u>)0l(p|ff)) 

wl-ff 

< inf I (q | w) 

w\-& 

= l(q\V) 

since 

I(P| | tw) 0 1(p | &)) <I(«l«0. 

because of the definition of 0 and the continuity of ®. 

Since the above inequality is valid for any proposition p in the theorem follows. 

A dual result also holds for possibility functions. 

Theorem (Generalized Modus Ponens for Possibility Functions): Let S* be a partition of U and 
let q be a proposition. If Poss(p) and Poss(g|p) are real values, defined for every proposition p in 
, such that 

Poss(p) > C(p|#’) 1 

Poss(g|p) > sup [l(g|u>) 0 I(p | «>) ] , 

1 u\-% 

then the following inequality is valid 

sup [Poss(g|p)® Poss(p)] >C(g|Jf). 

& 

Proof: Note first that if w is an evidential world, then 

C(p|ST)>I(pM. 
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It follows then from the nonincreasing nature of 0 with respect to its second argument that 

Poss(g|p) l > sup [l(? | in) 0l(p | tn)] 

\1) 

> sup [1(9 | u>) 0 C(p | #”) ] , 

and, therefore, that 

Po 8 s( 9 |p)® Poss(p) > 8UP [ 1(9 I to) 0C(p|Sf)] ® C(p\V). 
wh if 


Taking now, in the above expression, the supremum with respect to all propositions p in 3 s , it 


is 


SUp [P 08 s( 9 |p)®P 08 s(p)] > 8Up SUp [l(«|«iO0C(p|ST)] ® C(p\V) 
& & 


( 1 ) 


Note, however, that since & is a partition, there always exists a proposition p in 3* such that 
C(p| Jf) = 1 (i.e., p “intersects” if) and, therefore, 


sup 

& 


sup [ I (9 I w) 0 C (p | &) ] ® C (p | if) 

w\-& 


> sup [1(9 I til) 0 C(p| #)] ® C(p I if) 

wl -if 

= 8Up 1(9 | til) 
tyhJT 


= C(9|*). 

The thesis follows at once by combination of the inequalities (1) and (2). 


( 2 ) 

I 


Finally, notice also that, although the theorems above have been characterized as duals, it is 
not necessary that be a partition for the generalized modus ponens for necessities to hold, while 
the proof of its possibilistic counterpart relies on such assumption. It should be clear, however, 
that richer propositional collections would lead to better lower bounds for values of the degree 
of implication 1(9 | &). 


5.2 Variables 

The ®-transitivity property of I is the essential fact expressing the relationships between the degrees 
of implication of three propositions that were proven in the previous section. The statements of 
these relations in most works devoted to fuzzy logic are made, however, using special subsets of the 
universe of discourse that are described through the important notion of variable. Introduction of 
this concept, which is also central to other approximate reasoning methodologies, permits us to make 
a clearer distinction between similarities defined, in some absolute sense, from the joint viewpoint of 
several respects and related proximity measures that compare objects (in our case, possible worlds) 
from the marginal viewpoint of one or more variables. 

In what follows, we will assume that only certain propositions, specifying the value of a system 
variable belonging to a finite set 

r= {x,y,z,... }, 

will be used to characterize possible worlds. 
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The propositions of interest are those formed by logical combination of statements of the type 

‘The value of the variable V is v,” 

where V is in the variable set 9^ and where v is a specific value in the domain ■S r (V r ) of the variable 
V. 

We will also assume that, in any possible world, the value of any variable is a member of the 
corresponding domain of definition of the variable. In the context of our discussion, we will not 
need to make special assumptions about the scalar or numeric nature of the state variables, using 
the notion in the same primitive and general sense in which it is customarily used in the predicate 
calculus. 

We will be specially interested in subsets, called variable-sets, of the universe U consisting of 
worlds where the value of some variable V is equal to a specified value v. We will denote by [X = z] 
(similarly [Y = y], etc.) the set of all possible worlds where the proposition “The value of the 
variable X is z” is true. Clearly, the variable-sets in the collection 

{ [X = z] : z is in 1&(X )} 

partition the universe into disjoint subsets. These collections have recently been used to charac¬ 
terize the concept of rough sets [30], of importance in many information-system analysis problems, 
including some that arise in the context of approximate reasoning. A similar notion has also been 
used also to describe algorithms for the combination of probabilities and of belief functions [39]. 

To simplify the notation we will write 


tuhz, why,... 

as shorthand for tuh [X = z], tnl-[y = y], ... , respectively. 

5.2.1 Possibilistic Structures and Laws 

The usual statements of the laws of fuzzy logic are made, as mentioned before, through the use of 
variables rather than by means of general symbolic expressions. It is customary, for example, to 
speak of the possibility of the variable X taking the value z, to describe the value that a possibility 
function for an evidential set Sf attains for the proposition [A = z]. 

In our model, we will say therefore, that a function 

Poss(-): &{X) ►- [0,1] 

is a possibility function for the evidential set and the variable A, whenever 

Posb(z)>C([X = z]|8’) i 

for all values z in the domain 3f{X). Similarly, we will say that Nec(-) is a necessity function for 
X whenever 

Nec (z) < I ([X = z] | ) , 

for all values z in S8(X). 
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If possibility distributions are point functions defined in this way as point functions in the variable 
domain &{X), then it is possible to use the disjunctive laws of fuzzy logic proved in Section 4.4 to 
extend their definition over the power set of S&(X), i.e., 

Nec(j4UB) = max [Nec(A), Nec(fl)] , 

Poss(i4US) = max [ Poss(/l), Poss(fl) ] , 

where A and B are subsets of the domain 1&(X). These equations are usually given as the basic 
disjunctive laws of possibility distributions. 

Note that, using such extensions, both possibility and necessity functions are nondecreasing 
functions (with respect to the order induced by set inclusion). The value of Nec(A) measures 
the extent by which the evidence supports the statement that the variable value necessarily lies in 
the subset A of its domain of definition, with a dual interpretation being applicable for possibility 
distributions. 

5.2.2 Marginal and Joint Possibilities 

The original similarity relation introduced in Section 3.1 may be considered to be a measure of 
proximity between possible worlds from the joint viewpoint of all system variables. The notion 
of variable permits, however, the definition of similarities from the restricted viewpoint of some 
variables or subsets of variables. 

These restricted perspectives play a role with respect to the original similarity S that is analogous 
to that of marginal probability distributions with respect to joint probability distributions. To derive 
useful expressions that describe similarities between two values x and x' of the same variable X, 
it should be noted first that the degree of implication I(-1 •) is transitive. This fact permits the 
application of a theorem of Valverde [44] to define a function Sx by means of the expression 

S x : &{X) x SS{X) h- [0,1]: (z, x') min [l(z |z'). I(*' I*)] ■ 

Defined in this way as a “symmetrization” of the preorder induced by the degree of implication 
I(-1 •), the marginal similarity S x has the properties of a similarity function. Furthermore, the 
“projection" operation entailed by the use of I(z|z')» based on tbe projection of every z'-world 
into the set of z-worlds), may be considered to be the basic mechanism to transform the original 
similarity function into one that only discern differences in the values of the variable X . 

It must be noted, however, that, unless additional assumptions are made about the nature of the 
original similarity S, the function S x fails to satisfy the intuitive requirement 

S(w,w') < S x (w,w'), 

whenever w F x and w' h x' i.e., the similarity between two objects from a restricted viewpoint is 
always higher than their similarity from more general regards that encompass additional criteria of 
comparison. 

Although considerable research remains to identify alternative definitions of marginal similarities 
that are not hampered by this problem, a basic result of Valverde [44], presented in Section 6.2 below, 
appears to provide the essential tool that must be employed in to produce the required coarser 
measures. The role of additional reasonable assumptions that might be demanded from S so as to 
facilitate the construction of marginal similarities with desirable characteristics is also the object of 
current investigations of the author. 
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5.2.3 Conditional Distributions and Generalized Inference 

The basic conditional structures of fuzzy logic are usually defined as elastic constraints that restrict 
the values of a variable given those of another. By simple extension of our previous convention to 
conditional structures, we will write Nec(y|x) and Poss(j/|ar), as shorthand for 

Nec ([y = I/] | pf = x] ) and Poss ([Y = y] | [X = x] ) , 


respectively. 

If a classical (i.e., Boolean) inferential rule of the type 

“If X = x, then Y is in R(x)” 


is thought of as the definition of a relation R defined over pairs ( x,y ) in the Cartesian product 
X x Y, then such a relation may be used to define a multivalued mapping that maps possible values 
of X into possible values of Y as illustrated in Figure 5. 



X 


Figure 5: Inference as a Compatibility Relation. 

Such a compatibility relation perspective was an essential element of the original formulations 
of both the Dempster-Shafer calculus of evidence [8] where distributions in some space (i.e., the 
domain of some variable X) are mapped into distributions of another variable (i.e., the domain of 
another variable Y) by-direct transfer of “mass” from individual values to the union of their mapped 
projections and the compositional rule of inference [51]. 
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Note that, whenever Poss(y|z) = 1, if the bound is actually attained, i.e, if 

sup [ I(y | w) 0 I(z | tt>) ] = 1, 

then it is possible for an evidential world w in [X = x] (i.e., I(z | w) = 1) to be such that why. 
Pairs ( x,y ) such that Poss(y|x) = 1 may be considered to approximate the core 10 of a generalized 
inferential relation that allows to determine bounds for the similarity between evidential worlds 
and those in the variable set [Y = y] on the basis of knowledge of similar bounds applicable to 
the variable set [X = *]. This relation, which is the fuzzy extension of the classical compatibility 
mapping R illustrated in Figure 5, may be thought as a descriptor of the behavior, for z-worlds, 
of the values of the variable Y “near” R. The compatibility relation is itself approximated by (or 
embedded in) the core of the conditional possibility distribution, i.e., worlds w such that w h x and 
why, with Poss(y|z) = 1 . 

Since the collection of the sets [X = x ] partitions the universe U into disjoint sets, then the 
generalized modus ponens laws may be readily stated in terms of variable values as 

Nec(y) = sup [Nec(y|z)®Nec(z)] , 

X 

Poss(y) = sup [Poss(y|a;)<i)Po88(z)] , 

X 

clearly showing the basic nature of the inferential mapping as the composition of relational combi¬ 
nation (i.e., ®-“intersection”) and projection (i.e., maximization). 


5.2.4 Fuzzy Implication Rules 

In this section we will examine proposed interpretations for conditional rules, usually stated in the 
form 

If X is A, then Y is B , 

within the context of possibilistic logic. While, in two-valued logic, any such rule simply states that 
whenever a condition A is true, another condition B also holds, various interpretations have been 
proposed for rules expressing other notions of conditional truth. 

In the case of probabilities, for example, degrees of conditionality have been modeled either by 
means of conditional probability values Prob(/l | B), which measure the likelihood of B given the 
assumed truth of A, or by the alternative interpretation Prob(->i 4 V B), used by Nilsson [ 29 ] in his 
probabilistic logic, which esssentially quantifies the probability that a rule is a valid component of a 
knowledge base. Either one of these interpretations is valid in particular contexts being, respectively, 
the probabilistic extensions of the so called “de re,” i.e., 




and “de dicto”, i.e., 


n (p -» q), 


interpretations of conditionals in modal logic. 

10 The core of a fuzzy set |i:Uh [0,1] is the set of all points w such that fi(w) = 1, i.e., the points that "fully” 
belong to p. 
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In fuzzy logic, two major interpretations have been advanced to translate conditional rules, 11 
with A and B corresponding to the fuzzy sets 

p A : X t-> [0,1], and /i B : y (-.[0,1], 

The first interpretation was originally proposed by Zadeh [52], as a formal translation of the 
statement 

If n A is a possibility for X, then pb is a possibility distribution for Y. 

This conditional statement, which may be regarded as a constraint on the values of one variable 
given those of another, states the existence of a conditional possibility function Poss(-|-) such that 

p B (v) > sup [Poss(y|z)<S> jix(*)] > Poss(y|z)®/i^(a;). 

X 

Recalling now the definition and properties of the pseudoinverse, we may restate this particular 
interpretation as 

Poss(y|z) = /ifi(y)0 Pa{x) > I(y |u>)0l(z|u>), 

for every world w h &. 

In Zadeh’s original formulation, made within the context of a calculus based on the minimum 
function as the T-norm, conditionals were, however, formally translated by means of the pseudoin¬ 
verse of the Lukasiewicz T-norm. Certain formal problems associated with such a combination were 
pointed out by Trillas and Valverde[42], who developed translations consistent with the T-norm 
used as the basis for the possibilistic calculus. 

Using the characterization of conditionals introduced in Section 4.5, this relation may also be 
thought of as a measure of the degree by which a possibility for Y exceeds a fraction (measured 
by the conditional possibility distribution) of a given possibility distribution for X. In particular, 
whenever Poss(y|z) = 1, then p B (y) > p A (x), indicating the possible existence —since Poss(y|x) 
is only an upper bound of I(y | w) 0 I(z | w) — of an evidential world such that w h x and w I- y, 
with z in A and y in B. 

As illustrated in Figure 6, where it has been assumed that the underlying metric (i.e., dissimilar¬ 
ity) is proportional to the euclidean distance in the plane, the core of the corresponding conditional 
possibility distribution is an (upper) approximant of a classical compatibility relation (indicated by 
the shaded area in the figure) that fans outward from the Cartesian product of the cores of A and B. 
If this interpretation is taken, whenever several such rules are available, then each one of these rules 
will lead to a separate possibility distribution. Combination of these upper bounds by minimization 
results in a sharper possibility estimate that represents the “integrated” effect of the rule set. 

The second interpretation of conditional relations, leading to a wide variety of practical appli¬ 
cations [41], was utilized by Mamdani and Assilian to develop fuzzy controllers. The basic idea 
underlying this explanation follows an approach originally outlined by Zadeh [47,48,51]. In this case, 
a number of conditional statements of the form 

If X is Ah , then Y is B* , k = 1, 2.n, 

are given as a combined “disjunctive” description of the relation between X and Y , rather than 
as a set of independently valid rules. The purpose of this rule set is the approximation of the 

11 A rather encompassing account of potential fuzzy reasoning mechanisms can be found in a paper by Mizumoto, 
Fukami, and Tanaka. [27] 
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compatibility relation by a “fuzzy curve” generated by disjunction of all the rules in the set, as 
shown in Figure 7. 

Recalling the characterization of conditioning as an extension of a classical compatibility relation, 
we may say that the core of the compatibility relation is approximated by above by the union 

n 

U [ core (^J x core (/iB*)] 

k=l 

of the Cartesian products of the cores of the fuzzy sets for At and Bt- In this case the multiple rules 
are meant to approximate some region of possible (X, Y) values, and the result of application of 
individual component rules must be combined using maximization to produce a conditional possibil¬ 
ity function. We may say, therefore, that under the Zadeh-Mamdani-Assilian (ZMA) interpretation, 
the function 

Poss(y|z) = sup min(^ i4 (x),/rfl(y))J , 
is a conditional possibility for Y given X. 

It is important to note that the two interpretations of fuzzy rules that we have just examined 
are based on different approaches to the approximation (by above) of the value 

sup l(y|u>)0l(x|u>) 
u/y-ff 

being, in the the case of the Zadeh-TVillas-Valverde (ZTV) method, the result of the conjunction of 
multiple fuzzy relations such as that illustrated in Figure 8, while, in the case of the ZMA logic, the 
construction requires disjunction of relations such as that illustrated in Figure 9. 

The difference between both approaches when combining several rules is illustrated also in Fig¬ 
ures 10 and 11, showing the contour plots for the a-cuts of the fuzzy relations that are obtained 
in a simple example involving four rules. In these figures, the rectangles with a dark outline corre¬ 
spond to the Cartesian products of the cores of the antecedents At and Bt. Darker shades of gray 
correspond to higher degrees of membership. 

The reader should be cautioned, however, about the potential for invalid comparisons that may 
result from hasty examination of these figures. Each formalism should be regarded as a procedure for 
the approximation of a compatibility relation that is based on a different approach for the description 
of relationships between variables. In the case of the ZMA interpretation, the intent is to generalize 
the interpolation procedures that are normally employed in functional approximation. As such, this 
approach may be said to be inspired by the methodology of classical system analysis. The ZTV 
approach, by contrast, is a generalization of classical logical formulations and may be regarded, 
from a relational viewpoint, as a procedure to describe a function as the locus of points that satisfies 
a set of constraints rather than as a subset of “fuzzy points” of a Cartesian product. 

Figures 10 and 11, while showing that the same rule sets would lead to radically different results, 
should not be considered, therefore, to discredit interpolative approaches as such techniques, pro¬ 
ceeding from a different perspective, should normally be based on rule sets that are different from 
those utilized when rules are thought of as independent constraints. 
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Figure 9: A Component of a Disjunctive Rule Set (ZMA) 
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Figure 10: Contour Plots for a Rule Set (ZTV) 



Figure 11: Contour Plots for a Rule Set (ZMA) 
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6 THE NATURE OF SIMILARITY RELATIONS 


In this closing section, we will examine issues that arise naturally from our previous examination of 
the role of similarities as the semantic bases for possibility theory. 

Our discussion focuses on two topics. We look first at the requirements that our theory imposes 
upon the nature of the scales used to measure proximity or resemblance between possible worlds. 
Finally, our examination of the interplay between similarities and possibilities turns to issues related 
to the generation of similarity relations from such sources as domain knowledge that describes 
significant relations between system variables. 

6.1 On Similarity Scales 

Our previous interpretation of possibilistic concepts and structures has been based on the use of 
measures of proximity that quantify interobject resemblance using real numbers between 0 and I. 
Our assumptions about the use of the [0,1] interval as a similarity scale have been made primarily, 
however, as a matter of convenience so as to simplify the description of our model while being 
consistent with the customary definitions of possibility and necessity distributions as functions taking 
values in that interval. 

Close examination of the actual requirements imposed upon our similarity scales reveals, however, 
that our measurement domain may be quite general so as to include symbolic structures such as 

{ identical, very timilar ,..., completely dissimilar} . 

Our model is based on the use of a partially ordered set having a maximal and a minimal element 
that measure identity and complete dissimilarity, respectively. Furthermore, we have assumed the 
existence of a binary operation (the triangular norm ®) mapping purs of possible worlds into real 
numbers, with certain desirable order-preserving and transitive propel ties. The concept of triangular 
norm, however, does not rely substantially on the use of real numbers as its range and may be readily 
extended to more general partially ordered sets with maximal and minimal elements. 

We have also assumed a continuity property for the triangular norm operation. This property, 
however, simply requires that a notion of proximity also exist among similuity values so as to 
provide a form of (order-consistent) topology in that space. While, in general, more precise scales 
will result in more detailed representations of interworld similuity, it is important to stress that the 
similuity-based model presented here does not rely in "denseness” assumptions such as the existence 
an intermediate value c between any different values a and b in the similuity-measurement scale. 

From a practical viewpoint, the major requirement is to quantify proximity in such a way as to 
be able to determine that two quantities ue similu to some degree (i.e., approximate matching). 
The degree of precision that such a matching entails is problem-dependent and will be typically the 
result of conflicting impositions between the desire, on one hand, to keep granuluity relatively high 
to reduce complexity, and the need, on the other, to describe system behavior at an acceptable level 
of accuracy. The work of Bonissone and Decker [4] is a significant example of the type of systematic 
study that must be curied out to define similuity scales that ue both useful snd tractable. 
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62 The Origin of Similarity Functions 

The model of fuzzy logic presented in this note is centered on the metric notion of similarity as a 
primitive concept that is useful to explain the nature of possibilistic constructs and the meaning 
of possibilistic reasoning. In this formulation, similarities are defined as real functions defined over 
pairs of possible worlds. 

. From this perspective, similarities describe relations of resemblance between objects of high com¬ 
plexity, which, typically, result from consideration of a large number of system variables. Reliance 
on such complex structures has been the direct consequence of a research program that stressed 
conceptual clarification as its primary objective. In practice, however, it will be generally difficult 
to define complex measures that quantify similarity between complex objects on the basis of a large 
number of criteria. 

Similarities provide the framework that is required to understand approximate relations of corele¬ 
vance, usually stated as generalized conditional rules. The practical generation of similarity functions 
typically proceeds, however, in the opposite direction, from separate statements about limited as¬ 
pects of system behavior to general metric structures. Once such resemblance measures are defined, 
they may be used to express and acquire new laws of system behavior determined, for example, from 
historical experience with similar systems. Furthermore, such similarity notions may be used as the 
basis for analogical reasoning systems that try to determine system state on the basis of similarity 
to known cases [23]. 

Perhaps the simplest mechanism that may be devised to generate complex metrics from sim¬ 
pler ones is that which starts with measures of resemblance that quantify proximity from a limited 
viewpoint. These metrics are usually derived, using a variety techniques, in unsupervised pattern 
classification (or clustering) problems [20]. In many important applications, hierarchical taxonomies 
—a feature of many representation approaches in artificial intelligence—may be used, often in con¬ 
nection with a variety of weighing schemes—quantifying branching importance—to generate metrics 
that often satisfy the more stringent requirements of an ultrametric [22]. 

Classification hierarchies such as those may be thought of as sets of general rules, having a par¬ 
ticularly useful structure, that specify interset proximity from relevant, but restricted viewpoints, 
eventually providing measures of similarity between variable values (i.e., the “leaves” of the taxo- 
nomical tree). More generally, however, we may expect that sets of possibilistic rules (i.e., a general 
knowledge base) defining a general semantic network of corelevance relations may be available as 
the source for the determination of interobject proximity. These possibilistic semantic networks 
resemble conventional semantic networks in most regards, being more general in that, in addition 
to specifying knowledge about system behavior in some subsets of state-space, 13 they also specify 
characteristics of behavior in neighborhoods of those subsets. 

We may think, therefore, that the antecedents of implicational rules define general regions in state 
space where existence of relevant knowledge may increase insight through application of inferential 
rules. Using Zadeh’s terminology, these antecedents define “granules” that identify important regions 
of state-space and indicate the level of accuracy that is required (or granularity ) to perform effective 
system analysis. In this case, the possibilistic granules correspond to fuzzy sets that are used to 
specify both what is true in the core of the granule and, with decreasing specificity, what is true 
in a nested set (i.e., the a-cuts) of its neighborhoods. The ability to specify behavior using such 
a topological structure results in inferential gains that are the direct consequence of our ability 

1J The expression “state-space” is loosely used here to indicate the space defined by all system variables. 
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to reason by similarity; an ability that is made possible by the approximate matching property 
of the generalized modus ponens. From another perspective yet, the fuzzy granules identified by 
possibilistic rules may also be thought of as generalizations of the arbitrary variable sets used in 
a variety of artificial intelligence efforts aimed at understanding system behavior using qualitative 
descriptions of reality [16]. 

A number of heuristics may be easily formulated to integrate “marginal” measures of resemblance 
into joint similarity relations. More generally, however, we may state the problem of similarity 
construction as that of defining metric structures on the basis of knowledge of the aspects of system 
behavior that are important to its understanding—i.e., the previously mentioned granules, which 
define what must be distinguished. Since generally those granules are fuzzy sets, the relevance to 
similarity construction of the following representation theorem, due to Valverde, may be immediately 
seen: 


Theorem [Valverde]: A binary function 5 mapping pairs of objects of a universe of discourse U 
into [0,1] is a similarity relation, if and only if there exists a family X of fuzzy subsets of U such 


that 


S{w, w') = inf | min ^ h(w) 0 h(vf), h(w') 0 h(w) ^ j , 


for all w and tt/ in U, where the infimum is taken over all fuzzy subsets h in the family X. 


Besides its obvious relevance to the generation of similarity relatione from knowledge of important 
sets in the domain of discourse, Valverde’s theorem—resulting originally from studies in pattern 
recognition—is also of potential significance to the solution of knowledge acquisition problems be¬ 
cause of the important relations that exist between learning procedures and structure-discovery 
techniques such as cluster analysis. 
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7 CONCLUSION 


This note has presented a similarity-based model that provides a clear interpretation of the major 
structures and methods of possibilistic logic using metric concepts that are formally different from 
the set-measure constructs of probability theory. Regardless of the potential existence, so fax un¬ 
established, of probability-based interpretations for possibilistic structures, this metric model makes 
clear that there are no compelling reasons to confuse two rather different aspects of uncertainty into 
a single notion simply because one’s favorite theoretical framework, in spite of its otherwise many 
remarkable virtues, fails to fully capture reality. 

Succintly stated, being in a situation that resembles a state of affairs 5 does not make S likely or 
viceversa. Furthermore, our reference state may not even be possible in the current circumstances 
—making it completely unlikely—but we may still find it useful as a comparison landmark .This 
use of “impossible” examples as a way to illustrate system behavior is very prevalent in human 
culture, being exemplified by such utterances as “he had the strength of a horse and the swiftness 
of a swallow,” even if it is obvious to all that no such beasts exist other than for such metaphorical 
purposes. 

The insight provided by this model makes it rather obvious that very little can be gained by 
continuing to assert a potential—although never revealed—encompassing probabilistic interpretation 
for possibilistic structures that, presumably, would render them unnecessary as serious objects of 
scientific discourse. In addition, and quite beyond whatever understanding theory may provide, the 
current success of possibilistic logic as the basis for major systems of important human value [41] 
—often unmatched by other approaches—should be enough to convince those having more pragmatic 
perspectives as to its utility. 

The task for approximate reasoning researchers is to proceed now beyond unnecessary controversy 
into the study of the issues that arise from models such as the one presented in this note. Among 
such questions, further studies of the relations between the notions of possibility, similarity, and 
negation and of those between probability and possibility are of major importance. 
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INTRODUCTION 

In this brief communication, we summarise the results of recent research on the conceptual foundations of 
fuzzy logic [5], This research resulted in the formulation of several semantic models that interpret the major 
concepts and structures of fuzzy logic in terms of the more primitive notion of resemblance and similarity 
between “possible worlds,” i.e., the possible states, situations or behaviors of a real-world system. The metric 
structures representing this notion of proximity are generalizations of the accessibility relation of modal 
logics (i). 

Possibilistic reasoning methods may be characterized, by means of our interpretation, as approaches to the 
description of the relations of proximity that hold between possible system states that are logically consistent 
with existing evidence, and other situations, which are used as reference landmarks By contrast, probabilistic 
methods seek to quantify, by means of measures of set extension, the proportion of the set of possible worlds 
whcic n proposition is true. 

Out discussion will focus primarily on the principal characteristics of a model, discussed in detail in a recent 
technical note [2], that quantifies resemblance between possible worlds by means of a similarity function that 
assigns a number between 0 and 1 to every pair of possible worlds. Introduction of such a function permits 
to interpret the major constructs and methods of fuzzy logic: conditional and unconditional possibility and 
necessity distributions and the generalized modus ponens of Zadeh on the basis of related metric relationships 
between subsets of possible worlds. 

THE APPROXIMATE REASONING PROBLEM 

Our semantic model of fuzzy logic is based on two major conceptual structures: the notion of possible 
wot Id, which is the basis for our unified view of the approximate reasoning problem [3], and a metric structure 
that quantifies similarity between pairs of possible worlds. 

If a reasoning problem is thought of as being concerned with the dele- mi nation of the truth-value of a set 
of propositions that describe different aspects of the behavior of a system, then a possible world is simply a 
function (called a valuation) that assigns a unique truth value to every proposition in that set and that, in 
addition, is consistent with the rules of propositional logic. The set of all such possible worlds is called the 
universe of discourse. 

In any reasoning problem, knowledge about the characteristics of the class of systems being studied 
combined with observations about the particular system under consideration restricts the extent of possible 
worlds that must be considered to a subset of the universe of discourse, called the evidential set, which will 
be denoted S. 

The purpose of the inferential procedures utilized in any reasoning problem may be characterized as that 
of establishing if, for a given proposition p, either € ^ p or £ ->p, i.e., whether existing evidence implies the 

hypothesis or its negation. In approximate reasoning problems, as illustrated in Figure 1, such determination 
is, by definition, impossible: there are some possible worlds in the the evidential set tvher- the hypothesis is 
true and some where it is false. 
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Figure 1: The Approximate reasoning problem 


SIMILARITY FUNCTIONS AND GENERALIZED IMPLICATION 

In the view of fuzzy logic proposed by our model the purpose of possibilistic methods is the description of 
the evidential set by characterization of the resemblance relations that hold between its elements and elements 
of other sets used as reference landmarks. 

To represent similarity or resemblance between possible worlds we introduce a binary function 5 that 
assigns a value between 0 and 1 to every pair of possible worlds tv and to'. A value of 5 equal to 1 means that 
w and to' are identical while a value of S equal to 0 indicates that knowledge of propositions that are true in 
one possible world does not provide any indication about the nature of the propositions that are true in the 
other. 

In addition to the above requirement of reflexivity, i.e. 5(tv,u>) = 1, we will need to impose additional 
axioms to assure that S captures the semantics of a similarity relation. In addition to assuming that 5 
is symmetric, i.e., S(tv,tv') = 5(tv',tv), we will also require that 5 satisfies a form of transitivity that is 
mrtivated by noting that if tv, tv' and tv" are possible worlds and if tv is highly similar to u>' and tv' is highly 
similar to tv", then it would be surprising if tv and tv" were highly dissimilar. This consideration indicates 
that knowledge of S( to, tv') and S(iv',tv") should provide a lower bound for values of 5(tv, tv"), as expressed 
by the inequality 

S(tv,iv") > 5(tv,tv')®5(tv',iv"), 

where ® is a binary operator used to represent a real function that produces the required bound. If reasonable 
requirements are imposed upon the function ®, it is easy to show that it has the properties of triangular norms: 
a class of functions that play a major role in multivalued logics (4). 

The generalized transitivity property expressed by the above inequality may be easier to understand as 
a classical triangular inequality if it is noted that the function 6*1-5 has the properties of a metric. 
When ® is the Lukasiewicz norm a® b * max(o + b - 1,0), then the transitivity property of 5 is equivalent 
to the well-known triangular property of distance functions. If ® corresponds to the Zadeh triangular norm 
a®6 = inin(a,6), then 6 may be shown to satisfy the more stringent ultrametric inequality. 

The correspondence between propositions and subsets of possible worlds simplifies the interpretation of 
the classical rule of modus ponens a* a rule of derivation based on the transitive property of set inclusion. If 
thie; propositions p, q and r are such that the set of possible worlds where p is true is a subset of the set of 
possible worlds where q is true, and if such set it itself a subset of the set of worlds where r is true, then the 
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modus ponens simply states that the set of p-worlds is a subset of the set of r-worlds. 

The conventional relation of set inclusion, based on the binary truth-value structure of classical logic, 
allows only to state that a set of possible worlds is a subset of another or that it is not. Introduction of 
a metric structure in the universe of discourse, however, permits the quantification of the degree by which 
a set is included into another. Every set of possible worlds, as illustrated in Figure 2. is a subset of some 



Figure 2: Degree of implication 

neighborhood of any other set. The minimal amount of “stretching" that is required to include a set of possible 
worlds o in a neighborhood of a set of possible worlds p, given by the expression I(p | q) = inf sup S(w, tv'), 

•n't-* whp 

is called the degree of implication. 

The degree of implication function has the important transitive property expressed by I (p 1 q) > I(p I r) ® I(r | q ), 
which is the basis of the generalized modus ponens of Zadeh. As illustrated in Figure 3, this important rule of 
derivation tells us how much the set of p-worlds should be stretched to encompass q on the basis of knowledge 
of the sizes of the neighborhoods of p that includes r and of r that includes 9 . 



Figure 3: The generalized modus ponens 

A notion dual to the degree of implication is that of degree of consistence, which quantifies the amount 
by which a set must be stretched to inteisect another, and that is given by the expression C(p|?) == 
sup sup 5{te,tv') . 

w'>*7 tut-p 

POSS1BILISTIC DISTRIBUTIONS 

Although the transitive property of the degree of implication essentially provides the bases for the con¬ 
ceptual validity of the generalized modus ponens, this rule of derivation is typically expressed by means of 
necessity and possibility distributions. 

An unconditioned necessity distribution given the evidence t is any function defined over propositions 
that bounds by below the degree of implication function, i.e., any function satisfying the inequality Nec(p) < 
I(p|£). Correspondingly, an unconditioned possibility distribution is any upper bound for the degree of 
consistence function, i.e., Poss(p) > C(p| £). 

The definition of conditional possibility and necessity distributions makes use of a form of inverse of the 
triangular norm denoted 0 and defined by the expression 

a 06 = sup{c: 6 ®c<a}. 

Using this function, it is possible to define conditional possibilistic distributions as follows: 
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Definition: A function Nec('j-) it called a conditional necessity distribution for € if 


Nec(?|p)< inf (l(fl|u>)0l(p|w))., 

*■£ 

Definition: A function Po**(-|-) it called a conditions1 possibility distribution for £ if 

Poss(fjp) > sug (l(?|u))0l(p|tw)j. 

GENERALIZED MODUS PONENS 

The compositions! rule of inference or generalized modua ponent of of Zadeh it - generalization of the 
corresponding classical rule of inference that may be used even when known facts do not match the antecedent 
of a conditional rule. The interpretation provided by our model explains the generalized modus ponens as 
an extrapolation procedure that uses knowledge of the similarity between the evidence and a set of possible 
worlds p (the antecedent proposition), and of the proximity of p-worlds to j-worlds, to bound the similarity the 
latter to the evidential set. The actual statement of the generalized modus ponens for necessity distributions 
in terms of similarity structures makes use of a family V of satisfiable propositions that partitions the un »e ri : 
of discourse: 

Theorem ( Generalized Modus Ponens for Possibility Functions): Let V be a partition and let q be a propo¬ 
sition. If Poss(p) and Poss(?|p) are real values, defined for every proposition p in V, such that 

Poss(p) > C(p | €), Poss(< 7 |p) > sug (I(? | w) 01(p | to)), 

then the following inequality is valid: 

sup (Poss(?|p) ® Poss(p) ] > C(q | €) . 

A dual result holds for necessity functions. 
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Abstract 

This paper addresses a number of fundamental issues on the nature of the concepts 
and structures of fuzzy logic, focusing, in particular, on the conceptual and functional 
differences that exist between probabilistic and possibilistic approaches. 

A semantic model provides the basic framework allowing definition of possibilistic 
structures and concepts by means of a function that quantifies proximity, closeness, or 
resemblance between pairs of possible worlds. The resulting model is a natural exten¬ 
sion, based on multiple conceivability relations, of the modal logic concepts of necessity 
and possibility. By contrast, typical, chance-oriented, probabilistic concepts and struc¬ 
tures rely on measures of set extension that quantify the proportion of possible-worlds 
where a proposition is true. 

Resemblance between possible worlds is quantified by a generalized similarity rela¬ 
tion, i.e., a function that assigns a number between 0 and 1 to every pair of possible 
worlds. Using this similarity relation, which is a form of numerical complement of a clas¬ 
sic metric or distance, the major constructs and methods of fuzzy logic—conditional 
and unconditional possibility and necessity distributions and the generalized modus 
ponens of Zadeh—are defined and interpreted. 
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1 Introduction 


In this paper, we present a semantic model of the major concepts, structures, and methods 
of fuzzy or possibilistic [16,17] logic. This model is based on a framework that combines 
the notion of possible world [2] (i.e., a potential state or situation of a real-world system) 
with measures of proximity or resemblance between pairs of possible worlds. The resulting 
structures are substantially different in character and nature from those of probabilistic 
reasoning, which are based on measures of set extension, used to quantify the proportion of 
possible worlds where a given proposition is true. 

The results reported in this paper are the latest in a continuing investigative effort aimed 
at clarifying basic conceptual similarities and differences between a numbi of approaches 
to the treatment of imprecision and uncertainty. Using also possible-world semantic models, 
prior research has established that the Dempster-Shafer calculus of evidence may be inter¬ 
preted by structures that result from the combination of conventional probabilistic calculus 
with epistemic logics [9]. By contrast, the formal structures discussed herein clearly show 
that fuzzy logic may be understood in a straightforward fashion using conventional metric 
notions in a space of possible worlds without resorting in any form to probabilistic con¬ 
cepts. Furthermore, the actual functions that are used to combine possibilistic knowledge 
are substantially different from those used in the probability calculus. 

Our exposition, which will be limited to the major structures of fuzzy logic, defines 
possibilistic concepts using a more primive notion that has been found to be an essential 
component of important human cognitive processes [14]. The notion of similarity, in spite 
of its importance in reasoning proceses, has not received substantial attention in treatments 
based on the use of logical concepts. 

Perhaps as a consequence of its reliance on metods for the manipulation of symbolic 
strings and on a single (partial order) relation between formulas (i.e., implication) as the 
basis for almost all of its techniques and procedures, there has been little attention given in 
formal logic to the consideration of other formal structures that capture important features 
of human knowledge such as the resemblance that exists between situations or circumstances. 
Although, for example, stating that Mary is worth $1,000,000 as oppossed to saying that 
she is worth $999,999 may be rather inconsequential in terms of the implication of either 
statement to a decision-maker (e.g., trying to establish a credit line), there is nothing in the 
basic framework of logic that makes the second statement any more different than saying 
that Mary is broke (i.e., neither of the three statements is logically consistent with the other 
two). 

The determination and use of similarity information is, however, not only central to all 
forms of analogical reasoning but it is an essential element in the derivation of physical law. 
Formal studies in measurement theory [7] clearly show the role that measures based on similar 
behavior play in the derivation of rational measurement schemes, while also explaining the 
ubiquitous presence of numeric scales throughout science. 

The results presented in this paper show that, when such notions of proximity are for- 


1 



malized in the context of a possible-worlds model, the major functional structures of fuzzy 
logic—possibility and necessity distributions—and its major inferential procedure—the gen¬ 
eralized modus ponens of Zadeh—may be readily explained as a natural extension of classical 
logical concepts. In particular, possibility and necessity distributions simply correspond to 
best and worst scenarios in a space of possible real-world states, while the generalized modus 
ponens [17] is a sound inferential procedure that may be regarded as a form of logical ex¬ 
trapolation between neighboring situations. 

The scope of this paper prevents a detailed discussion of all pertinent results and deriva¬ 
tions. A complete account of all relevant matters regarding the similarity-based model of 
fuzzy logic presented in this paper is presented in a related technical note [10], which, essen¬ 
tially, this paper summarizes. 

2 The Approximate Reasoning Problem 

Our model of the approximate reasoning problem is based on the notion of “possible world.” 
Informally, possible worlds are the conceivable states of affairs of a real-world system that 
are consistent with the laws of logic. 

Restricting ourselves, for the sake of simplicity to propositional formulations, a possible 
world is a function [2] that assigns a unique conventional truth value (i.e., true or false) to 
every proposition that describes some relevant aspect of the state of the system and, that, 
in addition satisfies the axioms of propositional logic. 

In the absence of any knowledge about the behavior of a system of interest or of any 
observation about its state, it is impossible to determine which, among all conceivable sit¬ 
uations, corresponds to the actual state of the real world. Availability of factual evidence 
or determination of the laws of behavior of the system permits, however, to eliminate some 
possible worlds in this universe of discourse from consideration. The remaining possible 
worlds correspond to satisfiable propositions that, in addition, are logically consistent with 
the evidence. This subset of conceivable situations or scenarios will be called the evidential 
set, denoted &. 

If the typical reasoning problem is thought of as the determination of the truth value of 
a proposition h (the hypothesis), then an approximate reasoning problem may be described 
as one where available evidence does not permit such evaluation without ambiguity. In other 
words, as illustrated in Figure 1, there are some members of the evidential set where the 
hypothesis is true and some where it is false. 

Our approach to the formalization of the major concepts and structures of fuzzy logic 
of fuzzy logic is based on a generalization of a central concept of semantic models of modal 
logics. Modal logics [4] may be generally described as extensions of conventional two-valued 
logic that permit to qualify, in various ways, the meaning of propositional truth. 

In our model, we utilize modal concepts to explain basic possibilistic structures using 
the more primitive notion of similarity. This notion is introduced, however, by means of 
conventional set-theoretic and logical concepts. In this regard, our approach to the study 
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Figure 1: The Approximate Reasoning Problem. 

of the interplay of modal and possibilistic logics is different from approaches such as that 
used by Lakoff [6] who sought to generalize modal logics using fuzzy-set concepts; or that of 
Dubois and Prade[3], who investigated modal structures with a view to the development of 
formal proof mechanisms in possibilistic logic. 

A major concept of semantic models of modal logic systems is a binary relation R , called 
the accessibility or conceivability relation. This relation is assumed to have a number of 
properties intended to capture the semantics of various qualifications of propositional truth, 
ranging from logical necessity through the state of knowledge of rational agents to concepts 
related to the ideal behavior of ethical decision-makers. 

Our aim is to characterize the extent by which statements that are true in one situation 
or scenario may be said, perhaps with some suitable modification, to be true in another 
state of affairs that resembles it. We are particularly interested in describing more general 
(i.e., less specific) propositions that are true in one possible world as a function of the 
propositions that are true in another. In order to model a continuous range of proximity 
between possible worlds, we will generalize the notion of accessibility relation to a full family 
of binary relations R a , indexed by a numerical parameter a taking values between 0 and 1, 
along the same lines—albeit with a different purpose—utilized by Lewis in his treatment of 
counterfactuals [5]. 

3 Similarity and Graded Possibility 

We will introduce a family of accessibility relations 

{R a : a € [0,1]}, 


3 











by means of a binary function S, called the similarity relation, that maps pairs of possible 
worlds into numbers between 0 and 1. The multiple relations of accessibility R a are defined 
in terms of this similarity function by 

ivRaio 1 if and only if S(w,w') >a a € [0,1]. 

The function 5 is intended to capture a notion of proximity, closeness, or resemblance 
between possible worlds with a value of 1 corresponding to the identity of possible worlds 
and a value of 0 indicating that knowledge of propositions that are true in a possible world 
does not provide any indication of the propositions that are true in the other. To assure that 
the function S has the semantics of a relation that quantifies resemblance between possible 
states of affairs, it is necessary to require that it- satisfies a number of properties. 

Besides the above mentioned property that the similarity between a possible world and 
itself has the highest possible value, equivalent to stating that each accessibility relation 
R q is reflexive , we will also requiie that the similarity between different possible worlds be 
strictly less than one. This requirement is intended to assure that the similarity relation 
may distinguish between different states of the possible world. 

The similarity relation will also be assumed to be symmetric, and to satisfy a relaxed 
form of transitivity. Clearly, if the pairs of possible worlds ( w, tv') and ( w', w") correspond 
to highly similar situations, it would be surprising if tv and tv" were highly dissimilar. It is 
natural to assume, therefore, that 

S(w, tv") < S(w, tv 1 ) © S(w', tv"), 

where © is a binary c perator used to represent the lower bound as a function of its arguments. 
This requirement is equivalent to the relaxed transitivity condition 

Rq® 0 Ra ® R(3 > 

which replaces the usual, more stringent, definition of transitivity. 

Imposition of reasonable requirements upon the function © shows that it has the prop¬ 
erties of a triangular norm [11]. These functions, which play a significant role in multivalued 
logics [12], may be justified, therefore, purely on the basis of metric considerations. Impor¬ 
tant examples of triangular norms are 

a®b = min(a,b), a®b = max(a + 6 - 1,0), and a®b=ab, 

called the Zadeh, Lukasiewicz, and product triangular norms, respectively. 

The generalized transitivity property that is expressed by triangular norms clarifies their 
relationship to the conventional mathematical concept of metric. If S is a similarity function, 
then the function 8 = 1 — S has the properties of a distance function. When © cor e ponds 
to the Lukasiewicz norm, then the transitivity property of S corresponds to the well-known 
triangular property of distance functions. If © correponds to the Zadeh triangular norm, 
then 8 may be shown to satisfy the more stringent ultrametric inequality. 
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1 Introduction 

The notion of similarity, which plays a major role 
in human cognitive processes [4], may be used to 
formulate a number of semantic models that ex¬ 
plain the major concepts of fuzzy logic (5]. These 
formalisms show that possibilistic reasoning is fun¬ 
damentally different from approaches based on the 
notion of probability: an additive measure of set 
extent. 

The idea that knowledge of propositions that 
are true in certain situations may be used to derive 
truth-values in similar situations has not received 
much attention in conventional logical treatments. 
This state of affairs may be traced to the reliance 
of logical methods on symbolic procedures that 
only recognize one important relationship between 
formulas, i.e., the partial order defined by impli¬ 
cation. 

This paper briefly describes one model *' n ex¬ 
plains possibilistic structures: possibility and ne¬ 
cessity distributions; and the major derivational 
rule of fuzzy logic: the generalized modus ponens; 
in terms of simpler concepts related to notions 
of resemblance between possible worlds. In par¬ 
ticular, the latter procedure is shown to gener¬ 
alize its classical counterpart by allowing a form 
of logical extrapolation between similar situations 
and scenarios. A full discussion of this moriei and 
its implications is presented in a related technical 
note [2]. 

2 The Approximate Reasoning Problem 

Our model is baaed on a unified view of approx¬ 
imate reasoning methodologies that regards these 
procedures as techniques that describe certain prop¬ 
erties of subsets of possible worlds. Informally, 
possible worlds are the conceivable states (i.e., sce¬ 


narios, situations) of a real-world system that are 
consistent with the laws of logic. Restricting our¬ 
selves to propositional formulations, a possible world 
is a function that assigns a unique truth value (i.e., 
true or false) to every proposition that describes 
a relevant aspect of system of state and behav¬ 
ior and that, in addition, satisfies the axioms of 
propositional logic. 

The set of all such possible worlds is called the 
universe of discourse. Knowledge about the class 
of systems being studied, combined with observa¬ 
tions about the actual system under consideration, 
usually restricts the set of states that must be con¬ 
sidered in an approximate reasoning problem to a 
proper subset of this universe. This subset, de¬ 
noted ft, is called the evidential set. 

In a typical approximate reasoning problem, as 
illustrated in Figure 1, available evidence does not 
permit to determine if a hypothesis of interest is 
true or false. Being unable to determine such truth 
value, approximate reasoning methods try to de¬ 
scribe significant properties of the evidential set. 
Possibilistic techniques describe the relations of 
similarity that hold between possible worlds in the 
evidential set and possible worlds in other sets, 
used as reference landmarks. 

3 Similarity Relations 

To capture the notion of proximity or resemblance 
between possible worlds, we will introduce a func¬ 
tion 5 that assigns a number between 0 and 1 to 
every pair of possible worlds. This function per¬ 
mits to define a family of relations between pos¬ 
sible worlds that generalizes the classical modal 
notion of accessibility'll]. By assumption, 5 at¬ 
tains a value of 1 only when its two arguments are 
identical. A value of 0, by contrast, is intended to 
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of th* values a such that q implies p to the degree 
a, expressed by 

I(p|?)« inf su^ S(tt>, u>'), 

defines a function I called the degree of implica¬ 
tion. The degree of implication, which is related to 
the notion of HauadorfF distance, has the transitive 
property 

i(pk)>i(pk)®i(r|?), 

which is the basis of the generalised modus ponens 
of Zadeh, illustrated in Figure 2. 


Figure 1: The approximate reasoning problem 

model the fact that knowledge of propositions that 
are true in one possible world does not provide any 
indication about propositions that are true in the 
other. 

The similarity relation will also be assumed to 
be symmetric and to satisfy a relaxed form of tran¬ 
sitivity, intended to capture the notion that the 
similarity between two possible worlds u> and w" 
bears some relation to the values of the similari¬ 
ties between each of them and a third world w\ 
expressed by the inequality 

S{tv,w") > S{w,w')®S{w',w"), 

where @ is a binary operator defined for pairs of 
numbers in [0,1]. Imposition of reasonable re¬ 
quirements upon the function ® shows that it has 
the properties of a triangular norm [3]. 

In what follows, we will also need a form of 
inverse of the triangular norm ®, denoted 0 , and 
defined by the expression 

a 06 = sup{c: 6 ®c<c}. 

4 Degree of Implication and Degree of Consis¬ 
tence 

The classical rule of modus ponens may be thought 
of as expressing the transitive property of subset 
inclusion. Introducing a metric relation and its as¬ 
sociated topology permits to extend this relation 
by measures that quantify the size of the neigh¬ 
borhood of a set that contains another set. We 
will say thst q implies p to the degree a if, for ev¬ 
ery 9 -world w there exists a p-world w' such that 
S(w,w') > a. Since it is true that S(w,w') > 0 
for every pair of possible worlds, it is obvious that 
any proposition implies any other proposition to 
some degree. 

Informally, the definition of graded implication 
means that if p is stretched to the degree o, then 
this stretched set will include q. The upper bound 



Figure 2: Tbe geoerslised modus ponens 

A notion that is dual to that of the degree of im¬ 
plication is the degree of consistence, which quan¬ 
tifies the amount by which a set must be stretched 
in order to intersect another set, 

C(p |f) * tup sug S(w , tv'). 

Obviously, 

I(p|?) < C(p|f). 

5 Possibility and Necessity Distributions 

An unconditioned necessity distribution for F is 
any function Nec(-) mapping propositions (i.e., 
subset of possible worlds) into numbers between 0 
and 1 , such that 

Nec(p) < I(p|tf), 

i.e., a lower bound of the degree of implication of p 
by ft. Correspondingly, an unconditioned possibil¬ 
ity distribution is an upper bound for the degree 
of consistence of p and ft, i.e., 

Vou{p)>C{p\ft). 

Unconditioned necessity and possibility distri¬ 
butions measure how much a set must be stretched 
to enclose or intersect, respectively, the eviden¬ 
tial aet. The conditional counterparts of thess 
notions characterize the proximity relations that 
exist between evidential worlds and worlds satis¬ 
fying a consequent proposition q as a proportion 
of the similarity that exists between those eviden- 
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ti&l worlds and worlds that satisfy the antecedent 
proposition p. 

A function Nec(-|-) is called a conditional ne¬ 
cessity distribution for if if 

Nec(f|p) < [I($ j w) 0 1(p j w)). 

Correspondingly, a function Pom(-|<) is called a 
conditions} possibility distribution for it if 

Poss(?|p) > su£ [I(? |«o)0l(p|ti>)]. 

6 The Generalized Modus Ponens 

The usual statement of the compositions! rule of 
inference or generalized modus ponens of of Zadeh [5] 
is made in terms of a relationship between uncon¬ 
ditioned and conditioned distributions rather than 
in its simpler form, given above, as the transitive 
property of the degree of implication. 

The generalized modus ponens is a sound logi¬ 
cal extrapolation procedure that uses information 
about the metric relations that hold between dif¬ 
ferent subsets. On the basis of information about 
the similarity between evidential worlds and a set 
of possible worlds (i.e., the antecedent proposi¬ 
tion p), and of knowledge about the relative prox¬ 
imity of p-worlds and 9 -worlds (i.e., conditional 
distributions), the generalized modus ponens pro¬ 
duces bounds for the similarity between eviden¬ 
tial worlds and those that satisfy the consequent 
proposition q (i.e., unconditioned distributions for 
the consequent). 

The actual formal statement of the generalized 
modus ponens makes use of the notion of partition 
of the universe of discourse. A partition 9 simply 
corresponds to an ordinary partition of of the uni¬ 
verse of discourse into disjoint subsets, or, equiva¬ 
lently to a collection of mutually disjoint proposi¬ 
tions such that their disjunction is always true. 

Using this concept, the generalized modus po¬ 
nens for possibility distributions may be stated as 
follows in terms of distributions defined using sim¬ 
ilarity structures: 

Theorem: Let 9 be a partition and let q be a 
proposition. If Poss(p) and Po**(q\p) are real 
values, defined for every proposition p in .**, such 
that 

Poss(p) > C(p| *T), 

Pou{q\p) > sug[I(9|u>)0l(p|u>)], 

then the following inequality is valid 


s^p (Pom(9|p)®Pom(p)] > C(?|ir). 

A dual result holds for necessity distributions. 

7 Conclusion 

Similarity models provide useful interpretations for 
the basic concepts of possibilistic logic using a more 
primitive notion than that of possibility. In addi¬ 
tion to clearly showing that fuzzy logic structures 
are not related to probabilistic notions, the result¬ 
ing framework provides a solid basis for the study 
and extension of possibilistic logic in a number of 
directions of considerable practical importance. 
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1 Introduction 

If artificially intelligent systems are to produce ad¬ 
equate assessments of the state and behavior of 
the real world, they must cope with information 
and knowledge that is characterized by varying 
degrees of uncertainty, ignorance, and correctness. 
To address this need, we have developed a tech¬ 
nology called evidential reasoning. It is formally 
based upon the Dempster-Shafer theory of belief 
functions; it has been implemented as a domain- 
independent automated reasoning system; it has 
been successfully applied to a range of real-world 
problems [2]. Yet, its reliance on belief functions 
has drawn criticism. 

Our choice of an approach based on the Dempster- 
Shafer theory was not arbitrary. We believe that 
it has important methodological advantages such 
as its ability to represent ignorance in a direct 
and straightforward fashion, its consistency with 
classical probability theory, its compatibility with 
Boolean logic, and its manageable computational 
complexity. At the same time, we recognize that 
other approaches may also complement and aug¬ 
ment the assessments provided by evidential rea¬ 
soning. 

We will examine, within the limited scope pro¬ 
vided by the format of this paper, several criti¬ 
cisms of belief functions that have appeared in the 
literature. We plan, however, a more thorough dis¬ 
cussion of these criticisms in a related volume to 
be published in connection with this conference. 

We discuss first the fundamental theoretical bases 
supporting the belief-function approach and jus¬ 
tify its use in terms of the requirements imposed 
by ignorance of certain probability distributions. 
We consider the nature of Dempster’s rule of com¬ 
bination and argue that negative assessments ei¬ 


ther misinterpret the nature of the distributions 
being combined or ignore the basic independence 
assumptions that assure its validity. 

We answer also to critiques based on the com¬ 
putational complexity of the belief-function ap¬ 
proach. Such criticisms claim that the complexity 
of probabilistic knowledge representations grows 
exponentially with the size of the frame, thus mak¬ 
ing the theory unsuited for automated reasoning. 
Other comments addressed in our presentation cen¬ 
ter on limitations on the representational ability 
of belief functions and the lack of certain method¬ 
ological capabilities (e.g., decision-making mecha¬ 
nisms). 

Despite the criticism that belief functions have 
drawn, we believe evidential reasoning to be well- 
founded and to have practical utility in a broad 
range of applications. 

2 On Theoretical Soundness 

The theory of belief functions was originated by 
Dempster [1] in the context of statistical research. 
The use of the term “belief," together with its 
subjectivist connotations, is due to Shafer [7], who 
first applied the theory to the analysis of the infor¬ 
mation contained in imprecise and uncertain evi¬ 
dence. 

Although much skepticism has been voiced about 
the naturality of belief functions and their agree¬ 
ment with conventional probabilistic approaches, 
its theoretical bases are provided by a simple con¬ 
sideration about the role of evidence as a basic 
information carrier. 

In classical probabilistic treatments, it is as¬ 
sumed that, under certain evidential conditions if, 
the value Pr(p|jf) of the likelihood of a particular 
statement p is known. This view of evidence, while 


90 






adequate to represent the informational condition* 
of moat controlled experimental aetupa, faila, how¬ 
ever, to adequately model the effects that acquir¬ 
ing similar information has on our state of knowl¬ 
edge when the state of the world could not be ao 
readily controlled. 

In such circumstances, whenever the evidence 9 
is observed, three possible informational outcomes 
may result from examination of further informa¬ 
tion that later turns out to improve our state of 
knowledge: either p is found to be true, ->p is found 
to be true (i.e., p is false), or such information is 
insufficient to determine the truth value of p. Use 
of modal logic concepts, which are the bases of the 
formal model of Ruspini [6], suggests the use of the 
notation Kp, K->p, and Ip to identify these out¬ 
comes. Since these alternatives are exclusive, it is 
clear that 

Pr(Kp) + Pr(K-’p) + Pr(Ip) = 1. 

As shown by Ruspini, the function Bel(p) « Pr(Kp), 
has the properties of a belief function, as axioms- 
tized by Shafer. Furthermore, since it is possible 
that Pr(Ip) > 0, then,in general, it is Bel(p) + 
Bel(~>p) < 1. This inequality follows naturally, 
therefore, from classical probability theory, applied 
here to considerations about the provability of cer¬ 
tain propositions, as called by Pearl [4]. 

Similar considerations about the informational 
effect of independent bodies of evidence, which 
are beyond the scope of this short summary, indi¬ 
cate that Dempster's combination formula is, un¬ 
der its stated assumptions, completely consistent 
with conventional probability calculus. 

This interpretation quickly disposes of erroneous 
arguments based on unintended interpretations of 
the intervals defined by belief functions. Each such 
interval represents ignorance of a single probabil¬ 
ity value for a fixed proposition p under fixed evi¬ 
dential conditions 9. If critics choose to interpret 
such intervals as the possible values that condi¬ 
tional probalilities might attain when further ev¬ 
idence is collected, as suggested by Pearl [3], be¬ 
lief functions will not, indeed, behave according to 
such unintended semantics. 

3 On Decision Support 

A criticism of a more fundamental nature, how¬ 
ever, is often raised regarding the epistemological 
need for the belief-function approach. Summa¬ 
rized by statements such as Pearl’s (4] question: 
“why we should concern ourselves with the proba¬ 
bility that the evidence implies A, rather than the 
probability that A is true, given the evidence?,” 
these arguments correctly point to the basic knowl¬ 
edge requirement that most decision problems en¬ 
tail: if a rational choice is to be made, then we 


must have a proper informational basis to do it. 

This obvious consideration is twisted, however, 
to argue for the necessity to estimate unknown 
probability values when they are not available. We 
do not think that this modified, or pragmatic ne¬ 
cessity, argument is either sound or compelling. 
To answer Pearl's question, we concern ourselves 
with the probability of provability because that is 
all that our data and the laws of logic can pro¬ 
vide. We would rather measure the probabilities 
of truth, and endeavor to do so whenever poesible, 
but we do not think, however, that probabilities 
should be guessed, simply because we are com¬ 
pelled to choose a course of action, anymore than 
any other unknown physical parameter value. 

In our view, the belief-function approach may 
be used in a straightforward fashion to produce 
intervals of possible utility values. When such in¬ 
tervals overlap and cannot be ordered, this fact 
simply reflects a basic deficiency in our knowl¬ 
edge. We look down upon “pragmatic justifica¬ 
tions” with the same concern that any experimen¬ 
tal scientist shows about proposals to guess what 
he has not measured: the ability to make deci¬ 
sions in the absence of knowledge is, in our view, 
a handicap rather than an advantage of a method. 

4 On the Dempster Formula 

The Dempster formula is, currently, the princi¬ 
pal evidence integration mechanism of the belief- 
function approach. It was derived in the context 
of a basic model of the effect of probabilistic ev¬ 
idence that correctly interprets such evidence as 
constraints on probability values rather than as 
the source of the actual values, which are typically 
undetermined. 

The formula may be described as an expression 
that yields bounds for the conditional probabil¬ 
ity distribution Pr(*|ITj, 9j) on the basis of similar 
bounds for the probability distributions Pr(-|4Tj) 
and Pr(-|lf}), under certain conditions of indepen¬ 
dence. 

Criticisms about the Dempster formula may be 
broadly characterized as being the consequence of 
two basic misunderstandings about its validity and 
generality. 

First, the formula is intended to be applied only 
to those situations where its underlying assump¬ 
tions are valid. Alleged counterexamples such as 
that of the “three prisoner problem,” referenced by 
Pearl [4], fail to satisfy such assumption* and can¬ 
not be correctly said to be theoretical failures. We 
agree with Pearl, however, in its criticism of the 
use of the Dempster formula to produce a condi¬ 
tioning formula, leading to counterintuitive results 
(the “spoiled sandwich” effect), which we consider 
also to reflect failure of the basic independence as- 
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sumption*. We are endeavoring, however, to ex¬ 
tend the original theory to produce expression* 
to produce and utilize conditional belief informa¬ 
tion (5]. 

The second type of criticism* are baaed on the 
erroneous assumption that the two evidential bod¬ 
ies being combined should be interpreted as bounds, 
provided by two independent "experts,” which con¬ 
strain the values of the same probability distribu¬ 
tion. As it was pointed out before, the formula 
combines two different conditional probability dis¬ 
tributions. 

5 On Generality and Complexity 

The lack of generality of the belief-function ap¬ 
proach to represent interval constraints on a fam¬ 
ily of probability distributions is well known. Our 
reliance on the belief-function approach, in spite 
of such lack of generality, is baaed on two major 
considerations. 

First, our experience shows that, notwithstand¬ 
ing criticisms based on unrealistic worst-case sce¬ 
narios, the approach is computationally efficient. 
In particular, we have found that representation 
of belief functions in terms of basic probabilistic 
assignments results in a storage and manipulation 
scheme that is both economical and easy to un¬ 
derstand. In addition, we have sucessfully imple¬ 
mented tools, such as summarization and coarsen¬ 
ing operators, which may be effectively utilized to 
limit representational complexity. 

Second, our current functional operators have 
been chosen to guarantee that probabilistic infor¬ 
mation will always be capable of being represented 
within the scope of the approach, as more general 
constraints do not either enter into consideration 
or appear as the result of any of its functions. 

Our current concerns with the manipulation of 
conditional and dependent evidence show, how¬ 
ever, that, for some important problems, the re¬ 
sults of evidential combination fall outside the scope 
of its representational capabilities. Although more 
general schemes, such a* interval probabilities, do 
not suffer from this limitation, their inherent com¬ 
plexity precludes their practical application. 

Ongoing research indicates, on the other hand, 
that the belief-function approach may be used to 
approximate the results of these general evidential 
combination operations This research also shows 
the basic errors inherent in criticisms that regard 
the belief-function approach as a fully developed 


methodology incapable of sustaining further en¬ 
hancement and modification. Having been studied 
in depth for only fifteen years, its technological sta¬ 
tus is that of a young discipline being capable of 
enhancement on its own and of combination with 
other approaches to produce more general tools for 
probabilistic reasoning. Fhr from proving that we 
have reached a technological plateau, our investi¬ 
gations indicate that much is yet to be gained from 
such a development and integration process. 
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Abstract 

We address recent criticisms of evidential reasoning: an approach to the analysis of imprecise 
and uncertain information that is based on the Dempster-Shafer calculus of evidence. 

We show that evidential reasoning can be interpreted in terms of classical probability 
theory and that the Dempster-Shafer calculus of evidence may be considered to be a form 
of generalized probabilistic reasoning based on the representation of probabilistic ignorance 
by intervals of po; sible values. In particular, we emphasize that it is not necessary to resort 
to nonprobabilistic or subjectivist explanations to justify the validity of the approach. 

We answer to conceptual criticisms of evidential reasoning primarily on the basis of 
their confusion between the current state of development of the theory — mainly theoretical 
limitations in the treatment of conditional information— with its potential usefulness to treat 
a wide variety of uncertainty-analysis problems. Similarly, we indicate that the supposed 
lack of decision-support schemes of generalized probability approaches is not a theoretical 
handicap but, rather, an indication of basic informational shortcomings that is a desirable 
asset of any formal approximate reasoning approach. We also point to potential shortcomings 
of the underlying representation scheme to treat general probabilistic reasoning problems. 

We consider also methodological criticisms of the approach focusing primarily on the 
alleged counterintuitive nature of Dempster’s combination formula showing that such iesulis 
are the result of its misapplication. We address also issues of complexity and validity of 
scope of the calculus of evidence. 







1 Introduction 


If artificially intelligent systems are to produce adequate assessments of the state and behav¬ 
ior of the real world, they must cope with information and knowledge that is characterized 
by varying degrees of uncertainty, ignorance, and correctness. To .address this need, we have 
developed a technology called evidential reasoning. It is formally based upon the Dempster- 
Shafer theory of belief functions; it has been implemented as a domain-independent au¬ 
tomated reasoning system; it has been successfully applied to a range of read-world prob¬ 
lems [11]. Yet, its reliance on belief functions has drawn criticism. 

Our choice of an approach based on the Dempster-Shafer theory was not arbitrary. We 
believe that it has important methodological advantages such as its ability to represent 
ignorance in a direct and straightforward fashion, its consistency with classical probability 
theory, its compatibility with Boolean logic, and its manageable computational complexity. 
At the same time, we recognize that other approaches may also complement and augment 
the assessments provided by evidential reasoning. 

We examine several criticisms of belief functions that have appeared in the literature, 
discussing first the fundamental theoretical bases supporting the belief-function approach and 
justifying its use in terms of the requirements imposed by ignorance of certain probability 
distributions. We consider the nature of Dempster’s rule of combination and argue that 
negative assessments either misinterpret the nature of the distributions being combined or 
ignore the basic independence assumptions that assure its validity. We stress also that it is 
not necessary to rely on explanations that are either nonprobabilistic or subjective to justify 
the validity of the Dempster-Shafer calculus of evidence. 

Furthermore, we show that certain apparently counterintuitive properties of the approach 
(e.g., the “spoiled sandwich” paradox) are the natural consequence of considering families 
of possible probability distributions that solve an approximate reasoning problem. In the 
context of this discussion, we indicate also the inherent pitfalls of “axiomatic” approaches 
that accept or reject methodologies on the basis of their compliance with allegedly intuitive 
principles. 

We answer also to critiques based on the computational complexity of the belief-function 
approach. Such criticisms claim that the complexity of probabilistic knowledge representa¬ 
tions grows exponentially with the size of the frame, thuc making the theory unsuited for 
automated reasoning. Other comments addressed in our presentation center on limitations 
on the representational ability of belief functions and the lack of certain methodological 
capabilities (e.g., decision-making mechanisms). 

Despite the criticism that belief functions have drawn, we believe that evidential reasoning 
is well-founded and that it may be effectively applied to the solution of a broad range of 
important practical problems. 

Most of our comments will be made in direct reply to a recent criticism of the belief- 
function approach by Pearl [15] since we feel that his paper encompasses most of the major 











worries and concerns with the calculus of evidence. While mcst of the discussion of this 
paper consists of direct responses to issues raised by Pearl and others, our overall objective 
is considerably broader. Our answers are motivated by the same remarks of DeGroot, quoted 
by Pearl at the conclusion of his work, about the need to use our methodological approaches 
“... with the utmost care and in accordance with the highest ethical standards.” Our aim, 
like Pearl’s, is to enlighten and clarify, through careful discussion of rather subtle and delicate 
issues, rather than to engage in dogmatic defense of one approach to the detriment of another. 
It is our earnest hope that this work, in conjunction with other evaluations of the belief- 
function approach, will help to understand its bases, capabilities, and limitations. 

2 On Theoretical Soundness 

The theory of belief functions was originated by Dempster [4] in the context of statistical 
research. The use of the term “belief." together with its subjectivist connotations, is due to 
Shafer [18], who first applied the theory to the analysis of imprecise and uncertain evidence. 

Although much skepticism has been voiced about the naturality of belief functions and 
their agreement with conventional probabilistic approaches, its theoretical bases are provided 
by a simple consideration about the role of evidence as a basic information carrier. 

In classical probabilistic treatments, it is assumed that, under certain evidential con¬ 
ditions if, 1 the value Pr(p|fr) of the likelihood of a particular statement p is known. This 
view of evidence, while adequate to represent the informational conditions of most controlled 
experimental setups, fails, however, to adequately model the effects that acquiring similar 
information has on our state of knowledge when the state cf the world can not be so readily 
manipulated. 

In such circumstances, whenever the evidence if is observed, three possible informational 
outcomes may result from examination of further information that later turns out to improve 
our state of knowledge: either p is found to be true. ->p is found to be true (i.e., p is false), 
or such information is insufficient to determine the truth value of p. Use of modal logic 
concepts, which are the bases of the formal model of Ruspini [17], suggests the use of the 
notation Kp, K->p, and Ip to identify these outcomes. Since these alternatives are exclusive, 
it is clear that 

Pr(Kp) + Pr(K->p) + Pr(Ip) = 1. 

Furthermore, since the probability of Ip may be positive, it will be true, in general, that 

Pr(Kp) + Pr(K~»p) < 1 . 

This model, based on a combination of classical probability methods and the modal logic 
S5 [8,12], essentially provides—through the logical notion of possible world— a meaning 

throughout this paper, the symbol if is used to denote available evidence, i.e., a collection of propo¬ 
sitions about the real world that are known to be true either as the result of direct observation or as the 
consequences of applicable background knowledge. 
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for the unary operator K as the representation of the state of knowledge of a statistician 
that is estimating the probability of truth of diverse propositions {p, q ,...} under evidential 
conditions if. 

This statistician estimates those distributions by considering multiple samples of the 
state or behavior of a real-world system. Using, for each sample, additional information 
collected through further experimentation, the statistician may then establish or not the 
validity of a proposition p. If he is rather lucky, our statistician will find himself in the ideal 
situation where he can actually “know” 2 or “prove” that the real world is in a state s that 
is described to the best level of detail that is necessary to understand its behavior (i.e., a 
“possible world”). This is the state of knowledge usually attained, under perfect laboratory 
conditions, when experimental samples are fully analyzed and when the outcome of such 
analyses is classified in terms of a set of exhaustive and mutually exclusive alternatives. 

Under less desirable epistemological circumstances, however, the statistician will only be 
able to prove that a less specific proposition p is true. In the extreme case where no further 
information exists, he will be forced to say that his knowledge is limited to that provided by 
the evidence if, oi that it is “vacuous.” 

All samples so analyzed, however, can be classified as to the “most specific knowledge” 
that could be determined in each case. The corresponding probability measure of the set 
e(p) of samples where the proposition p was the most specific knowledge (caled an tpiste inic 
set by Ruspini) corresponds, in Shafer's framework, to the value m(p) of a mass function m, 
i.e., 

m{p) = Pr(e(p)). 

Correspondingly, the probability that p was “known” to be true during statistical experi¬ 
mentation, corresponds to the value Bel(p) of Shafer's belief function, i.e., 

Bel (p) = Pr(K p). 

The connection between the ability of our statistician to know that p was true and the 
belief and mass functions that he estimates through experimentation justifies both tthe 
expression epistemic pwbubility introduced by Ruspini [17] to describe the underlying prob¬ 
abilities defined over a particular set of situations or scenarios Kp, (called the episttmic 
universe), and their description as being “probabilities of provability” or “probabilities of 
necessity” by Pearl [14], following a suggestion by Fagin and Halpern[6]. 

In short, all such interpretations are equivalent to the original model of Ruspini, where 
a rational agent was able to prove the truth of different propositions under different infor¬ 
mational circumstances that were found to prevail, during his statistical experiment, with 

2 Note that, in the context of epistemic logics such as S5, the operator K behaves as a logical necessity 
operator. “Knowing" a proposition simply means that observations logically imply such proposition, or that 
it is necessarily true. 


4 









different frequencies of occurrence. Note, however, that while use of the terms “knowabil- 
ity,” “provability,” and “necessity” does much to provide adequate semantics to the calculus 
of evidence, its loose usage leads to unnecessary confusion. For example, in his recent 
criticism [15], Pearl takes some questionable semantic license with the term “necessity” men¬ 
tioning, for example, the probability that a decision “will have to made out of compelling 
necessity.” Such “pragmatic” necessity does not have anything to do, of course, with the 
“logical necessity” that underlies the Dempster*Shafer theory, i.e., the necessary truth of a 
proposition given available evidence. 

Since the ability to prove a proposition q entails the ability to prove any proposition p 
that is implied by q , it should be clear that 

Bel(p) = Y, "*(?)• 

i.e., the fundamental equation relating the basic structures of the calculus of evidence. It is 
also true, as shown by Ruspini. that 

Bel(p) < Pr (p) < 1 - Bel(-»p). 

providing bounds for the probability of p that may not be improved. This ability to manip¬ 
ulate probability intervals by means of the compact representation scheme of mass functions 
is the major reason for the appeal of the Dempster-Shafer methodology. 

While the above discussion clarifies the nature of the statistician’s knowledge modeled 
by belief and mass functions, doubts might still remain as to their utility to those that were 
not involved in their statistical estimation process. Such usage is, however, that made of 
any other probabilistic information. The analyst that observes if does not have the luxury 
that was available to the statistician estimating epistemic probabilities, i.e., the ability to 
collect additional information that permits a more detailed characterization of the state of 
the world, for the same reasons that the user of statistical tables is unable to utilize the 
raw data of the estimating statistician. I'lider such circumstances, the analyst is forced to 
rely on the probabilistic estimates provided by the statistician, which are believed on the 
basis of the assumed regularity of the repetitive behavior of the system: the epistemological 
cornerstone of probabilistic reasoning. 

In other words, the “probability of provability” is the best information that is available to 
the analyst; an observation that not only disposes of questions about its role in probabilistic 
reasoning, but also of Pearl’s worries about its use in lieu of the obviously more desirable 
“probability of truth” [15]: 

“why we should concern ourselves with the probability that the evidence implies .4. 
rather than the probability that .4 is true, given the evidence?". 

Clearly, we would prefer having the latter, but, unfortunately, we can only measure the 
former. 
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Our interpretation of the major evidential functions and structures also quickly disposes 
of erroneous arguments based on unintended interpretations of the intervals defined by be¬ 
lief functions. Each such interval represents ignorance of a single probability value for a 
proposition p under fixed evidential conditions if. If critics choose, for example, to interpret 
such intervals as the possible values that conditional probabilities might attain when further 
evidence is collected, as suggested by Pearl [13], belief functions will not, indeed, behave 
according to such unintended semantics. 

In closing this section, it is important to mention other alternative views of the structures 
of the calculus of evidence such as that recently proposed by Smets [19], which are based on a 
nonprobabilistic concept of belief. Although those models are interesting on the strength of 
their own virtues, we still emphasize that such interpretations are not required to reconcile 
the calculus of evidence with conventional probability theory. 

In consideration of our ability to reconcile all structures and formulas of the calculus of 
evidence, including the Dempster’s formula, with conventional probability structures, such 
as inner and outer probabilities, we do not feel strongly compelled to accept alternative epis- 
temic interpretations. Our skepticism in this regard is further supported by the observation 
that, often, such epistemological alternatives are the result of misundertandings about the 
role of certain evidential formulas and processes (e.g., normalization). For the same reasons, 
we remain unconvinced about the need to assign several alternative interpretations to the 
structures of calculus of evidence or to its functions, as in the recently suggestion Halpern 
and Fagin[7], which is echoed by Pearl [15]. 

3 On Decision Support 

A criticism of a more fundamental nature of the calculus of evidence is often raised regarding 
the output of generalized interval-probability approaches. Since these methods often fail, due 
to basic knowledge deficiencies, to rank decision choices by the value of some measure that 
quantifies the desirability of each choice (e.g.. expected utility), then it is said that they lack 
a decision-theoretic apparatus. 

Although these arguments correctly point to the basic knowledge requirement that most 
decision problems entail—if a rational choice is to be made, then we must have a proper 
informational basis to do it— this obvious consideration is twisted, however, to argue for the 
necessity to estimate unknown probability and utility values when they are not available. 
We do not think that this pragmatic necessity , argument is either sound or compelling. 

In our view, the calculus of evidence may be used in a straightforward fashion to produce 
intervals of possible utility-values. When such intervals overlap and cannot be ordered, this 
fact simpiy reflects a basic dethciency in our knowledge. We look down upon "pragmatic 
justifications” with the same concern that any experimental scientist must show about pro¬ 
posals to guess what he has not measured: the ability to make decisions in the absence of 
knowledge is, in our view, a handicap rather than an advantage of any method. 
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Far from lacking a decision-theoretic methodology, our approach provides an easily under¬ 
standable quantification of the undesirable effects that poor information has on our decision¬ 
making ability; ordering decisions whenever it is rationally possible but advising us that such 
ranking is not possible if our knowledge is insufficient. In brief, our approach does not only 
supports decision-making but. through its built-in sensitivity-analysis features, helps us to 
determine what must be done to reach a happier epistemological state. 3 

4 On Dempster's Rule of Combination 

The semantic model of the Dempster-Shafer theory also validates the so-called Dempster’s 
rule of combination, which permits the combination of belief and mass functions corre¬ 
sponding to different evidential observations, made under certain conditions of independence. 
When such conditions are not valid, use of this formula leads, of course, to erroneous results, 
often, although incorrectly, considered to be an essential handicap of the evidential reasoning 
approach, rather than a consequence of its misapplication. 

The Dempster formula is. currently, the principal evidence integration mechanism of the 
belief-function approach. It was derived in the context of a basic model of the effect of prob¬ 
abilistic evidence that correctly interprets such evidence as constraints on probability values 
rather than as the source of the actual values, which are typically undetermined, it may be 
described as an expression that, under certain conditions of independence, yields bounds for 
the conditional probability distribution Pr(-|tf t . #*) on the basis of similar bounds for the 
probability distributions Pr(-|y,) and Pr(-|$f 2 ). 

To understand the conceptual bases for the Dempster's formula of combination and its 
consistence with conventional probability, we resort to a generalization of the logical model 
used before to derive the basic relations of the calculus of evidence. Instead of considering a 
single epistemic operator, corresponding to a single statistician or observer, we will consider 
two such rational agents, with their knowledge modeled by means of two operators Kj 
and K 2 . Each of these rational agents will be assumed to be ignorant of the knowledge 
possesed by the other, i.e.. as if they were statisticians performing independent experiments 
under different evidential conditions and tf 2 - Their common knowledge, however, will be 
modeled by means of a nonindexed operator K corresponding to a third reliable agent that 
aggregates the statistical knowledge gathered by the other two. 

Clearly, in a given applicable situation (i.e.. the first agent observes and the second 
agent observes # 2 ), the integrating agent, who does not add any knowledge of his own, will 
be able to prove (or to “know” the truth of) a proposition p, if the other agents provide 
individual items of information that, when combined (i.e.. conjoined) imply p, as expressed 
by the basic combination axiom: 

3 For an example of an approach that incorporates decision-maker preferences into the framework of the 
belief-function calculus, the reader is referred to a recent paper by Strat (21). 









Kp is true if and only if there exist sentences pi and p 2 such that Kjpi and K 2 P 2 are 
true, and such that pj A P 2 => P- 

Using our three operators to generate all possible (i.e., logically consistent) states of 
knowledge that may be attained by each of the three agents while assessing the state of a 
real system, we may say that each of them has, as was the case before, a knowledge about 
the real world that may be represented by the “most specific” 4 propositions pi,p 2 and p that 
each has been able to prove (with p being obviously more specific than either p\ or p 2 ). In the 
terminology of Ruspini’s semantic model, each of the agents is in an epistemic state, denoted 
by e(p), e^pi) and e 2 (p 2 ), respectively, each corresponding to the set of all conceivable states 
of the real world (i.e., possible worlds) having such knowledge characteristics. 

The following important set-equation relating all of these types of epistemic sets as subsets 
of our enhanced epistemic universe, is the basis for the derivation of various evidential 
combination formulas 

e(p)= U ( e i 0 >i)ne 2 (p 2 )), 

piAp 2 =p 

of which the Dempster combination formula 

m(p)-A Y, ”h{Pi) rn 2 {p 2 ), 

PI Apj=J) 

where 

m(p) = Pr(e(p)|£T 1 . ST 2 ) - »Mpi) = Pr(e,(pi)|*f,). m 2 (p 2 ) = Pr(e 2 (p 2 )|if 2 ), 

and where A is a multiplicative factor, is the best known and used. 

Before reviewing the actual process leading to the derivation of the Dempster’s formula, 
it is important to pause and reflect upon the nature of the above sel-theorelic equation and 
its usefulness to derive evidence combination formulas. 

We may first note that this equation has been derived as a relation between subsets of pos¬ 
sible “epistemological states" that is valid regardless of any assumptions about probabilistic 
structures and their properties (e.g.. independence). As such, it does not only provides the 
bases for the derivation of the Dempster formula but actually of a variety of formulas that 
bound possible probability values within and without the structures of the Dempster-Shafer 
theory. 

Basically, this formula provides the basis to extend a probability function Pr that is 
known over subsets of the form ei(pj) and e 2 (p 2 ) (he., over two c-algebras), to the set of 
unions of sets of the form ei(pj) ne 2 (7> 2 ) (he., another <r-algebra). If such extension can be 
made uniquely—as is the case for the Dempster formula the resulting extension may be 
used to generate both the conditional probability Pr(-|STi t €T 2 ) and associated bounds Bel 

4 Note that such most specific knowledge always exists and is unique, but for logical equivalences, since 
the conjunction of all proved theorems is itself a theorem. 
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and PI, which are fully compliant with the Shafer axioms. In other less fortunate cases (e.g., 
dependent evidence), such extension is not unique and the lower envelope of the possible 
extensions, which is not a probability, will lead to bounds that do not satisfy the axioms of 
the calculus of evidence. 

A most important remark that must be made in this regard is that this equation is now 
being used to extend the evidential calculus approach by generalization of the notion of con¬ 
ditional probability by study of the probabilistic relations that define dependencies between 
the different types of epistemic sets (i.e., e(p), ei(pi) and e 2 (p 2 )). Pearl [15], however, be¬ 
lieves, apparently as the result of his examination of the role of compatibility nlalions in the 
calculus of evidence, that this approach is essentially limited in its expressive ability to set- 
theoretic relations between epistemic sets, which correspond to classical logical conditional 
statements (i.e., material implications). 

In fact, it may be easily seen from our epistemic identity that whenever the conditional 
probabilities Pr(e 2 {p i )\e i {p ] )) and Pr(ej(pi)|e 2 (p 2 )) are restricted to take the values 0 or 
l, 5 then this identity may be used to map one body of evidence into another, i.e.. by means 
of the compatibility relations that such probabilities define. 

Since under these assumptions, however, there can be only one proposition p 2 for every 
proposition pi such that Pr(e 2 (p 2 )|e! (pj)) = 1. and viceversa. then the compatibility relation 
that is so defined may be characterized by several implications of the form 

e,(pi) => e 2 (p 2 ) ■ 

and of the form 

eA<l2) => ei(</,). 

between knowledge states of one observer to knowledge states of the other which are useful 
to “transfer mass’' between propositions. This correspondence must be contrasted with that 
following from the limited interpretation given by Pearl who. from knowledge of 

ei(pi) =» e 2 (p 2 ). 

concludes (by contraposition), correctly but narrowly, that 

-,e 2 (p 2 ) ==> ie](pi), 

proceeding then to attach all material implication paradoxes (e.g., the “ravens paradox”) to 
the calculus of evidence as if they were an essential methodological bane. If that were to be 
the case—clearly it is not— the same concerns should be raised about the use of conditionals 
in conventional probability calculus. 

The second observation that may be made about the nature of evidence combination, in 
general, and the role of our basic set identity to generate combination formulas, in particular. 

5 lt may be shown from the definition of epistemic sets that, under such conditions, knowledge of 
Pr(ej(p 2 )|ei(pi)) suffices to derive Pr(ei(pi)|e 2 (/>>)). 
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is that while the functions to be combined are conditional probabilities over two different 
evidential sets ifi and if 2 , (i.e., the evidence observed by two agents), the desired integrated 
probability is a distribution over ifj fl if 2 (since we know that both observations are correct). 
Except for unusual cases, however, computation of Pr(-|ifj, if 2 ) entails a “normalization’’ 
operation that is fully consistent with the calculus of probability. Most of the normalization 
“paradoxes” are the result of misunderstanding about what is being combined: two different 
conditional probabilities rather than two different lower and upper bounds of the same 
probability function. 6 

Focusing now on the rationale for Dempster’s formula, we should notice first that the 
epistemic sets ei(pj) and e 2 (p 2 ) are such that 

ei(Pi)£fr,, e 2 (p 2 )C* 2 , 


i.e., the possible knowledge states of each statistician include awareness of the truth of the 
evidence that is observed by each. Furthermore. 


^i=U e i(Pi)- ^z = U e 2(P2). 

Pl P2 


where p\=$ff\ and p 2 =>Jf 2 , i.e., each statistician knows something that implies that his 
evidential observation is true (otherwise he would not be “counting” that sample).' 

Assume now that there exists a probability distribution Pr defined over the space of all 
possible epistemic states for our observing statisticians and our “integrating” agent. Each 
such epistemic state is a possible world that corresponds to a possible state of the world and 
to a possible state of knowledge for each agent that, in addition, is consistent with the laws 
of logic. We will assume now that, whenever p\=>if\ and p 2 =»$f 2 , it is 


Pr(e,(p,)ne 2 (p 2 )) 



Pr(ej(pj)) Pr(e 2 (p 2 )). 
0 . 


if Pi A p 2 ^ 0. 
otherwise. 


This assumption simply states that, when if 1 and if 2 are both true the probability that a 
rational observer will be in a particular knowledge, or epistemic state does not provide any 
information about the probability of the epistemic stale of the other agent (i.e., beyond ruling 
out logical impossibilities). In purely formal terms, we may say that knowledge of values of 
Pr over sets of the form ei(pi) does not provide any indication, beyond exclusion of logical 
impossibilities, of the values of Pr over sets of the form e 2 (p 2 ) and viceveisa. The epistemic 
states of our two agents may be said, therefore, to be uncorrelated in that knowledge of the 
state of one of our observers (by our integrating agent) does not provide any information 
about the state of the other, save for elimination of logical impossibilities, 

6 It is fair to say that much of the skepticism raised by the normalization used in Dempster’s formula can 
be traced to the exposition given by Shafer [18]. which suggests excessive reliance on unfounded heuristics. 

7 Recall that our observers, or rational agents, are statisticians estimating properties of certain statistical 
distributions by classifying each sample using their evidence and additional sample-dependent knowledge. 
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Noting now that 


P , . x IJr v _ Pr(ei(pi)) p , , . |sr . _ Pr(e 2 (p 2 )) 

Pr (e, ( P i ) |^T!) Pr( ff j) ' Pr ( e 2(P2)t^2) — p r( g, 2 j ' 


Pr(ej(pi) fl e-i(ih)\if i- #2) = 


Pr(ei(pi) ne 2 (p 2 )) 
Pr («r,n* 2 ) 


then, whenever Pi A P 2 ^ 0, it is 


Pr(ej(pi) n e 2 (p 2 )|^r i- ^2) = A Pr(et (pi)jiT ,) Pr(e 2 (p 2 )|^ 2) = m 2 {p 2 ) , 


from which the Dempster’s formula readily follows. 

The normalization factor 

. _ Pr(y 1 )Pr(y 2 ) 

Pr(«r,n«r 2 ) ' 

has been the object of considerable concern by both skeptics and proponents of the calculus of 
evidence. The above expression, however, provides the rationale for its usage while disposing 
of arguments about its alleged inconsistence with the probability calculus. In that expression, 
the denominator Pr(#i fl # 2 ) appears as the consequence of the need to derive probability 
distribution estimates with respect to the intersection of the two observed evidences if x 
and if 2 . The numerator of that expression simply reflects the need to combine conditional 
distributions over the same reference set (i.e.. the epistemic universe) while our probabilistic 
knowledge is expressed over two of its subsets (i.e.. if j and if 2 ). 

The essence of the conditions that lend validity to the Dempster formula may be sum¬ 
marized by saying that its usefulness is confined to ihe limited, but rather important cases, 
where estimates of probabilistic likelihood have been formulated by two rational agents on 
the bases of independent observations, while ignoring the evidence available to each other. 

If our integrating agent is thought of as being concerned with estimating the probabilities 
of certain events when both if 1 and if 2 are true, then we may say that, whenever the 
conditions validating the Dempster's formula hold, knowledge of the fact that a particular 
sample satisfies pi, tells him nothing about the likelihood of p 2 (unless, of course, pj happens 
to be logically inconsistent with p 2 ). Furthermore, whenever our integrating agent is done 
with his job, he should find out that estimating this joint distribution (i.e., over if\ n^f 2 ) 
could have been accomplished in an easier fashion by estimating the marginal distributions 
over if \ and if 2 and deriving the joint distribution by multiplication and normalization. 

Other accounts supporting the validity of the Dempster's formula and its consistence with 
the probability calculus have been advanced by several authors. A particularly compelling 
justification has been recently given by Wilson [22]. 


5 On Paradoxes 

Criticisms of the Dempster formula may be broadly characterized as being the consequence 
of basic misunderstandings about either its meaning or its validity. 






In this section, we examine three alleged paradoxes of the theory showing that the pur¬ 
ported inconsistencies are actually the results of conceptual misunderstandings or misrepre¬ 
sentations of the position of those who, while generally supporting the calculus of evidence, 
are concerned with its possible misapplication. 

5.1 The Three Prisoner Problem 


Turning our attention first io concerns about the validity of the Dempster’s formula, we may 
note that, in general, such examples ignore its scope of applicability, producing counterintu¬ 
itive results that are then used to dismiss the methodology as inadequate. Among those, the 
“three prisoner” problem discussed by Diaconis and Zabell [5] has been perhaps the more 
quoted and discussed. 

This problem is one of a variety of examples, where the combination formula is used as a 
conditioning formula by assuming that one of the mass distributions being combined simply 
assigns all its mass to a proposition p in the frame of discernment. Combination of such 
simple support function with another mass function associated with a belief function Bel( ) 
leads to the conditioning formula 


Bel(?|p) = 


Bel(r/ V -ip) — Bel(-ip) 
1 - Bel(-«p) 


In the particular case of the three prisoners problem, concerned with the guilt or innocence 
of a prisoner that has been chosen (by the Warden) as the guilty party by random draw among 
three candidates A\. A 2 and .4 3 . our "logical space" or frame of discernment is simply the 
Boolean algebra induced by the three noncompatible propositions 


“Prisoner .4, has been found guilty." 


where i = 1,2,3. Since only one of the three prisoners is chosen by the Warden, we clearly 
have 

Pr(Pi) = 5 . ? = 1,2,3 

(Note that Pr is actually a conventional probability distribution). 

Prisoner A\ now asks the Jailer to name one of the innocent prisoners other than him 
arguing that such information would clearly be of little help to him as an indicator of his 
potential fate. As Pearl notes, if q stands for the proposition “The Jailer names A 2 as one 
of the innocent,” then application of the conditioning rule leads to the result 


Bel (p 1 1 q) = PI (pi | q) = \. 

indicating that the conditional probability Pr(pi | q) must be exactly 1, instead of the "correct 
solution” 

0 < Pr (pi \q) < \ . 
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while also saying, agains the correct intuition of .4] that his chances of guilt have been 
increased as the result of the irrelevant information provided by the Jailer. From such 
an observation, Pearl concludes that the formula is seriously flawed both because of the 
counterintuitive result that it produces and for its “collapsing” of a family of solutions into 
a single value. 

Before proceeding to the discussion of Pearl’s concerns we may note, in passing, that this 
problem has been well known as a source of paradoxes and incorrect solutions within the 
scope of the conventional probability calculus [ 2 ] quite independently of any issues of validity 
of its treatment using the Dempster-Shafer calculus. Curiously enough, the explanations 
given to describe the conceptual errors leading to incorrect classical treatments resemble to 
some extent that shedding light on the inapplicability of the Dempster’s formula. 

Returning now to the role of the Dempster’s formula in this problem, we may first observe 
that, although, at first glance, the distributions representing the Jailer’s and Warden's choices 
seem independent, it is actually impossible for the Jailer to tell to A\ that A 2 is one of those 
to be spared if all he knew was that the Warden was choosing to be the guilty party by 
random draw (i.e.. he needs to know exactly who is the one chosen for punishment). To use 
the terminology of Ruspini’s model, the probability of ,4 2 being named as one of the innocent 
depends on the epistemic state of the Warden thus violating the independence assumptions 
of the Dempster’s formula. If all possible combinations of truth values for the propositions 
p,. i = 1.2,3, and q are tabulated, together with their probabilities, as done in Table 1 , then 
it is clear that 

Pr(r/|p 3 )=l. Pr( V ) = i(l+Q). 

where 0 < a < 1 represents the unknown probability that the Jailer will choose to name A 2 
rather than A 3 as innocent if .4! is actually the one chosen by the Warden as guilt)'. 


Possible World 

Warden's Choice 

Jailer Identifies 

Probability 

Hi 

Ai 

a 2 

3 ° 

Hi 

•4i 

a 3 

I(l-o) 

W 3 

.42 

A 3 

i 

3 

w 4 

•4 3 

•42 

1 

3 


Table 1 : Possible Worlds in the Three Prisoners Problem 


But then. 

Pr(r/|p 3 ) ^ Pr(<y). 


violating the assumptions, discussed above, that validate the utilization of the Dempster's 
formula (i.e. Pr(e 2 (p 2 )|ei(pi)) ^ Pr(e 2 (p 2 )). There is not, therefore, "total mister)'." as 
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Pearl says, as to the incorrect results obtained using the Dempster’s formula. Failing to be 
applicable, there should be little wonder that it leads to apparent paradox. 

Although, as clearly shown by this discussion, the incorrect treatment of the three pris¬ 
oner problem fails to invalidate the Dempster's rule of combination, we share the concern of 
Pearl and others about its wide misapplication, particularly when used undiscriminated to 
generate conditional distributions. In our research, we are endeavoring to extend the original 
theory to produce expressions to produce and utilize conditional belief information [16] that 
incorporates known dependencies between evidential bodies. These formulas are intended to 
provide better interval estimates that the typically uninformative bounds that are supplied 
by strict derivation of bounds in the absence of additional information by the expression 

B (9,P) Bel(p Aq) + Pl(p A ->g) 

which is mentioned in Dempster’s original paper [4] and that has been the object of recent 
concern by several authors [3,7]. 

In closing, we feel it is important to address other concerns of Pearl, going apparently 
beyond the three prisoners problem, about the counterintuitive nature of the “collapse” that 
usage of the Dempster formula often produces, which is manifested by production of a single 
conditional probability distribution when conditioning multiple members of a famil> V of 
probabilities over some specific subset <y. Just as it is true that all members of the family of 
distributions 

V = {Pr ( : t in [0,1 ]} 

defined in the set .V = {a.b.c} by the expression 


Pr, (.*") 


1 

2 ' 


1-0 


if x = a . 
if j- = b . 
if x = c, 


are such that Pr ( ({a,6}) = |. despite their variability over other subsets, it is also true that 
an extensive family of distributions may collapse into a single conditional probability without 
violating any rational or probabilistic principles. Such “invariants” are, in fact, desirable as 
elements that simplify the analysis of an otherwise complex probabilistic problem. For 
these reasons, we do not feel that, if the Dempster’s conditioning formula is applicable, its 
reduction of the variability of probability values should be a particular cause for concern as 
to its validity. 


5.2 The Spoiled Sandwich 

While discussing the suitability of the calculus of evidence either as a orm of generalized 
probabilistic calculus, or as a new theory that intends to capture a novel notion of belief. 
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Pearl [15] again faults the approach for failing to satisfy the following rationality principle 
originally stated by Aleilunas [1 j: 

“If two diametrical!) opposed assumption! yield two different degrees of belief in a 
proposition Q, then the unconditional degree of belief merited by Q should be some¬ 
where between the two.” 


As natural such a principle might look at first, the following simple and clever example of 
Wilson [23] clearly shows that it is neither intuitive nor appealing pointing, however, to the 
pitfalls of creating or supporting one’s favorite scheme on the strength of supposedly rational 
axioms. 

Let X = {a,6,c,c/} with .4 = [a,6} and B = {a,c}. so that B = {b,d}. Consider the 
family of probability distributions in .V 

V= {Pr ( :/ in [0.1] }. 

indexed by a parameter t in [0.1]. and defined by 


Pr«(W) = 

Pr, ({/>}) = i(l-f). 

Pr, ({<•}) = J. 

Pr,(W)l = 

and let 

Pr. = inf{Pr,} . 

Then, clearly 

Pr, (A) = if + j( 1 -0 = i- 

and, therefore, it is Pr. (.4) = i. The conditional probabilities Pr, (.4ji?) and Pr, {A\B) are 
given by the expressions 


Pr, (A\B) 
Pr,(.4|£) 

from which the lower bounds 


Pr, (|«)) _ 2 t 

Pr i ({o.c}) 5 + 3/* 

Pr,({/;}) = i(l-Q 
Pr ,({b.<l}) i + i(l -/)‘ 


Pr. (,4|£) = inf Pr,{.4|£) = 0. 

Pr. (A\B) = inf Pr,(.4|£) = 0, 
are easily derived, it is dear, however, that 

i = Pr. (.4) > Pr. (,4j£) = Pr. (A\B) = 0. 

showing that the the sandwich pri; ripie is violated even within the confines of conventional 
probability theory. 
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5.3 Other ways to spoil the sandwich 

Although such simple examples should suffice to dispose of concerns about spoiled sand¬ 
wiches, we feel that Pearl's discussion of the problem deserves a more detailed analysis, 
mainly because of its philosophical implications to rational thinking. This is particularly 
important as loose use of such terms as “assured winnings,” “support,” or “belief” in the 
absence of a sound formal interpretive framework may quickly mislead those engaged in the 
comparison of alternative methodologies. 

In an example, called “the Peter, Paul, and Mary Sandwich pioblen Pearl presents a 
betting situation where Mary prepares either a ham or a turkey sandwich promising to pay 
Paul $1000 should he guess correctly the type of sandwich that she has prepared. Not having 
a clue as to Mary’s choice. Paul then flips a coin guessing “ham” if the coin turns up heads 
and guessing “turkey” if it comes up tails. Paul, as Pearl notes, behaves like an “incurable 
Bayesian,” reckoning that 

Pr(win) = Pr(win j turkey) Pr(lurkey) + Pr(win | ham) Pr(ham) 

= Pr(tails | turkey) a -f Pr(heads | ham) (1 — o) = |, 

regardless of the value a of the probability that Mary has actually prepared a turkey sand¬ 
wich. Thus, in spite of not being “assured" a win. or having “supporting evidence,” Paul can 
invoke the rationality (doubtful, as we already saw) of the sandwich principle and argue that 
Paul does not need to engage in unnecessary k« owledge acquisition or experimentation [15): 

“If every possible outcome of an experiment would lead you lo choose t.he same action, 
then you ought to choose that action without running the experiment.” 

From such an observation. Pearl proceeds lo fault the philosophical underpinnings of the 
belief-function approach eventually going as far as to suggest that, should Bayesian ortho¬ 
doxy be unapplicable, the Dempster's formula—which, he freely admits, does not play any 
role in this example— be replaced by other formulas such as the well-known bounds recently 
rediscovered by Halpern and Fagin [7]. 

In the light of our previous example about the rather inconvenient ability of conventional 
probability families to spoil sandwiches, all of these pronouncements look increasingly sus¬ 
picious: What, however, may we say that it is wrong? This question may be answered in 
two equivalent ways. 

We may say first, keeping ourselves at the informal discussion level, that, often, the 
experiments may interact with probabilities in complex ways that, obviously, Pearl has not 
considered. Nothing in Pearl's formalism suggests, for example, that the sandwich has 
already been prepared and that i; may not be artfully substituted by Mary to assure that 
Paul always loses thus invalidating his hopes of having at least a 50^ chance of winning. 

The second, more formal, rendering of this observation is again based on the semantic 
model of Ruspini. In this, and in other similar problems, we have several agents that de¬ 
liberate about the state of the world on the basis of their knowledge and knowledge of the 
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knowledge of others. If the unary operator K represents the state of knowledge of one of 
these agents, then, as observed before, our agent is usually in one of three possible epistemo¬ 
logical states with respect to the validity of a proposition p: either he knows that p is true 
(denoted Kp), or he knows that p is false (denoted KSp). or he may be ignorant of such 
truth (i.e., ->Kp A ->K-ip v denoted L 7 ). 

In standard accounts, assuming that knowledge of the truth of does not affect like- 
hilood of truth of other propositions, we are simply concerned with a single form of condi¬ 
tional probability: that measuring the likelihood of p being true when q is true. In more 
complex epistemological situations, we may need to be concerned with such quantities as 
Pr(Kp ] K< 7 ), Pr(Kp | q). Pr(Kp 11<7). and the like. In other words. Bel(p | q) measures the 
support that knowledge of the truth of q provides to the truth of p. rather than the support 
provided by the truth of q to the truth of p. 

In the Peter, Paul, and Mary sandwich problem. Pearl implicitly assumes that 

Pr(KfuRyheads) = 0, 

PrfKiiARYt ails) = 0. 

Pr( turkey | iMARykeads) = a. 

Pr(hain | IfURyheads) = l-o. 


concluding correct!}. b\ application of the total probability law. over the exhaustive and 
exclusive set of possibilities 


{Km R ylieads. KnARytails. IjuRyheads}. 

that Paul has at least n •50 t /f chance of winning. 

This conect use of the total probability law does not mean that, by contrast, one should 
assume that the full extent of the conditional information provided by belief functions is 
limited to the conditional support functions 


Bel (p | q) = Pr (p | Key). Bel (p | -»r/) = P r(p \K->q ). 


as Pearl evidently does. In short, not knowing p is not the same as knowing -1 p. The example 
of „he Peter, Paul, and Mary sandwich shows that one needs to consider states of ignoraice 
that, when properly accounted for. spoil even the best conceived principles of rationality. 


To fully appreciate the complexity of the problem, suppose that we change Pearl’s implicit 
assumptions bringing the previously absent Peter into the scene as a spy acting on behalf of 
Mary In this new scenario, still consistent with Pearl's explicit statement of the problem. 
Peter, spying on Paul’s coin flipping experiment, alerts Mary who. being rather artful and 
deft of hand, substitutes the sandwich so as to make sure that Paul always loses. In this 
case, 


Pr(ham j KmRytails) = 1, Pr(turkey j KmRyheads) = 1 , 



and, most importantly. 


Pr^(K M ARyheads) U(K KA RYtails)^ = 1, 

i.e., Mary is never ignoranl as to what Paul will bet. 

The Peter, Paul, and Mary sandwich example does not, in our view, invalidate the 
applicability of the evidential approach but rather highlights the need to make necessary 
discriminations between propositional truth, knowledge of that truth, and the interplay 
between such conditions that are likely to be glossed over by cursory analyses based on 
conventional approaches. 

5.4 The Disagreeing Experts 

Another common misunderstanding regarding the role of the Dempster's combination for¬ 
mula is that provoked by an example of Zadeh [24] that, although originally formulated to 
illustrate problems with its misapplication, is often described as an indication of theoretical 
inadequacy. 

Zadeh’s example concerns two “experts" that assess, in a rather conflicting fashion, the 
likelihood of three, non-compatible, events A.B and C as shown in Table 2. Representation 
of each of the expert’s assessment as a mass distribution followed by their combination with 
the Dempster’s rule yields Pr(/?) = 1. indicating that the “true" event is B, an alternative 
considered to be rather unlikely by either of the assessors. 


Observer 

Pr(A) 

Pr(B) 

Pr(C) 

1 

0.99 

0.01 

0 

2 

0 

0.01 

0.99 


Table 2: Experts Disagree on the State of the World 


Although this example is often quoted as an example of the failure of the Dempster’s rule, 
it is clear that each of the rows in Table 2 defines a conventional probability distribution, thus 
suggesting that the problem is likely to lie elsewhere. W ; hile one may be tempted to defend 
any method of evidence combination by saying that the evidence, however peculiar, indicates 
that Observer 1 is ruling out alternative C while Observer 2 is excluding alterative A, thus 
leaving only B as the sole possible answer, it is clear, upon further examination, that the 
rows of Table 2 cannot possibly be evaluations of the same probability distribution. If that 
were the case, then at least one of the experts must be wrong, since there can only be one 
correct probability distribution, contradicting the assumption that they are both reliable. 
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Clearly, if the example is to make any sense —under any type of probabilistic interpretation- 
each row must correspond to a different conditional probability where the conditions corre¬ 
sponds to different observations available to each expert. A simple example, suggested by a 
recent example used by Kyburg [9] to address other probabilistic reasoning issues, will help 
to clarify matters. 

In this example we are being asked to reason, on the basis of available evidence, about 
the taste and edibility of certain berries that may be either small or large, red or blue, have 
good or bad taste, or be safe or poisonous to eat. We will assume that the berries in question 
are distributed according to the distribution shown in Table 3. 


Berries 

Color 

Size 

Taste/Edibility 

Probability 

Red 

Small 

Good/Edible 

99/199 

Blue 

Large 

Bad/Edible 

99/199 

Red 

Large 

Poisonous 

1/199 


Table 3: The Berries Probability Distribution 


If now a berry is picked up and found l»v an expert to be large, he will correctly conclude 
from such evidence that 

Pr(Good|Large) = 0. Pr(Poisonous|Large) = 0.01 . Pr(Bad Taste|Large) = 0.99. 

Another expert, noticing that the berry is red. will conclude, on the other hand, that 

Pr(Good|Red) = 0.99. Pr(PoisonousjRed) = 0.01. Pr(Bad Taste|Large) = 0. 

Clearly the evidential implications of these two separate observations are identical to the 
situation summarized in Table 2. Examination of Table 3. however, reveals that 

Pr( Poisonous [Red. Large) = 1 . 

a correct solution that must be rationally be expected from any reasoning method that 
purports to be valid. 

The solution to the puzzle of the disagreeing expei Is lies on recognizing that there is, 
in fact, no disparity of opinion among them. Each is providing quantitative measures of 
likelihood with respect to different reference classes. The Dempster formula, as observed 
by Zadeh, should never be applied to pool partial information about the same probability 
distribution. Furthermore, as shown by a sensitivity analysis of the results of its application 
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to the berries example, its usage in situations where there is considerable disparity between 
reference classes (as suggested by the large normalization factor) should be discouraged on 
the basis of practical rather than conceptual considerations. 

6 On Complexity and Generality 

The potential complexity of the belief-function approach to represent and manipulate interval 
constraints on a family of probability distributions has been often mentioned as a handicap 
of the evidential reasoning methodology. In spite of such misgivings, two major empirical 
observations have indicated that the approach is applicable to a wide variety of practical 
problems. 

First, our experience shows that, notwithstanding criticisms based on unrealistic worst- 
case scenarios, the approach is computationally efficient. In particular, we have found that 
representation of belief functions in terms of basic probabilistic assignments results in a stor¬ 
age and manipulation scheme that is both economical and easy to understand. In addition, 
we have sucessfully implemented tools, such as summarization and coarsening operators, 
which may be effectively utilized to limit representational complexity. 

Second, our current functional operators have been chosen to guarantee that probabilistic 
information will always be capable of being represented within the scope of the approach, 
as more general constraints do not either enter into consideration or appear as the result of 
any of its functions. 

The lack of generality of the belief-function approach to represent general lower-upper 
probability constraints is well known [10]. Our reliance on the methodology is primarily 
the result of practical considerations: while we would prefer to manipulate more general 
constraints on probability values, compelling computational efficiency arguments force us to 
limit the scope of the problems considered to those capable of being at least approximately 
solved by a belief-function treatment. 

Being, in general, partial towards interpretations of evidential structures that are fully 
compatible with probability theory, our current research is being directed toward the devel¬ 
opment of more general, yet efficient, representation and manipulation methods. 

Our current concerns with the manipulation of conditional and dependent evidence (i.e.. 
the evidential counterpart of conditional probabilities) show, for example, that, for some 
important problems, the results of evidential combination fall outside the scope of its repre¬ 
sentational capabilities. In our experience, these methodological limitations are more worri¬ 
some than any of the supposedly paradoxical results arising from its misuse or its claimed 
lack of a decision-making apparatus. 

Preliminary results[IGj indicate, on the other hand, that the belief-function approach 
may be used to approximate the results of these evidential combination operations and 
that extended representation mechanisms(20] may yet be developed to treat more genera! 
evidential problems. This research also shows the basic errors inherent in criticisms that 
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regard the belief-function approach as a fully developed methodology incapable of sustaining 
further enhancement and modification. Having been studied in depth for only fifteen years, 
its technological status is that of a young discipline being both capable of enhancement on its 
own and of combination with other approaches to produce more general tools for probabilistic 
reasoning. Far from p. :>ving that we have reached a technological plateau, our investigations 
indicate that much is yet to be gained from such a development and integration process. 
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